bright idea

Goals


- Understanding Big Data and its challenges

- Knowing how to deploy Hadoop and its ecosystem

- Understanding HDFS, MapReduce

- Structuring data with HBase

- Writing queries with HiveQL

- Running an analysis with Pig

Program

What is Big Data?
Data source: man, machine
The problem of size
Hadoop’s position in the landscape

The origin of the project
The HDFS filesystem
Understanding the MapReduce algorithm
The Hadoop environment: HBase, ZooKeeper, Hive, Pig…
The YARN API

From stand-alone mode to fully distributed clustered mode
Prerequisites, Hadoop distributions Hadoop
cluster: NameNode, ResourceManager, DataNode, NodeManager
Configuration files
Basic operations on the HDFS cluster: formatting, starting, stopping

Practical workshop: install Hadoop on 2 nodes, format and manipulate HDFS

The benefits of MapReduce Mappers, reducers, parallelism and independence of processing
Inputs, outputs
Submission of a job to Hadoop

Practical workshop: running a task via MapReduce, with output to HDFS

Random access, real time, read-write to Big Data
Features of HBase, NoSQL
Prerequisites, configuration
Handling via the HBase shell

Practical workshop: setting up HBase on Hadoop, creating and handling a table

Presentation of Hive
Manage the schema: databases, tables, views, partitions
Data manipulation, queries and map-reduce with HiveQL
Audits and error log

Practical workshop: loading big data in Hive, queries

Presentation, installation of the Apache Pig project
Local executions of Pig, in map-reduce mode
Scripter for Pig
The Pig Latin language
Data manipulation and storage with Pig

Practical workshop: write a script containing Pig Latin for a simple task, and execute it locally, then in map reduce mode

Manage logs and audit of Hadoop tasks
Discover MRUnit for unit tests in Hadoop
Local debugging Performance
monitoring

Practical workshop: setting up a more complex MapReduce job with traces and unit tests

Duration

4 days

Price

£ 1994

Audience

System administrators

Prerequisites

Knowledge of system administration, preferably Java

Reference

BUS869-F

Sessions

Contact us for more informations about session date