Goals
- Understanding Big Data and its challenges
- Knowing how to deploy Hadoop and its ecosystem
- Understanding HDFS, MapReduce
- Structuring data with HBase
- Writing queries with HiveQL
- Running an analysis with Pig
Program
What is Big Data?
Data source: man, machine
The problem of size
Hadoop’s position in the landscape
The origin of the project
The HDFS filesystem
Understanding the MapReduce algorithm
The Hadoop environment: HBase, ZooKeeper, Hive, Pig…
The YARN API
From stand-alone mode to fully distributed clustered mode
Prerequisites, Hadoop distributions Hadoop
cluster: NameNode, ResourceManager, DataNode, NodeManager
Configuration files
Basic operations on the HDFS cluster: formatting, starting, stopping
Practical workshop: install Hadoop on 2 nodes, format and manipulate HDFS
The benefits of MapReduce Mappers, reducers, parallelism and independence of processing
Inputs, outputs
Submission of a job to Hadoop
Practical workshop: running a task via MapReduce, with output to HDFS
Random access, real time, read-write to Big Data
Features of HBase, NoSQL
Prerequisites, configuration
Handling via the HBase shell
Practical workshop: setting up HBase on Hadoop, creating and handling a table
Presentation of Hive
Manage the schema: databases, tables, views, partitions
Data manipulation, queries and map-reduce with HiveQL
Audits and error log
Practical workshop: loading big data in Hive, queries
Presentation, installation of the Apache Pig project
Local executions of Pig, in map-reduce mode
Scripter for Pig
The Pig Latin language
Data manipulation and storage with Pig
Practical workshop: write a script containing Pig Latin for a simple task, and execute it locally, then in map reduce mode
Manage logs and audit of Hadoop tasks
Discover MRUnit for unit tests in Hadoop
Local debugging Performance
monitoring
Practical workshop: setting up a more complex MapReduce job with traces and unit tests
Duration
4 days
Price
£ 1994
Audience
System administrators
Prerequisites
Knowledge of system administration, preferably Java
Reference
BUS869-F
Sessions
Contact us for more informations about session date