bright idea

Goals


- Handle complex data sets stored in Hadoop without having to write complex code with Java

- Automate the transfer of data into Hadoop storage with Flume and Sqoop

- Filter data with Extract-Transform-Load (ETL) operations with Pig

- Query multiple datasets for analysis with Pig and Hive

Program

Hadoop Overview
Analyze Hadoop Components
Define Hadoop Architecture

Achieve reliable and secure
storage
Monitor storage metrics Control HDFS from the command line

Detail the MapReduce approach
Transfer the algorithms and not the data
Break down the key steps of a MapReduce task

Facilitate data entry and exit
Aggregate data with Flume
Configure data fan in and fan out
Move relational data with Sqoop

Explain the differences between Pig and MapReduce
Identify Pig use cases
Identify key Pig configurations

Represent the data in the Pig data model
Execute the Pig Latin commands in the Grunt Shell
Express the transformations in the Pig Latin syntax
Call the load and store functions

Create new relationships with joins
Reduce data size by sampling
Exploit Pig and user-defined functions

Consolidate datasets with unions
Partition datasets with splits
Add parameters in Pig scripts

Duration

4 days

Price

£ 2367

Audience

Database technicians and specialists, managers, business analysts and BI professionals who want to use Big Data technologies in their business

Prerequisites

Fundamental knowledge of databases and SQL is a major asset

Reference

BUS100295-F

Factorize Hive into components
Impose structure on data with Hive

Create Hive databases and tables
Expose the differences between data types in Hive
Load and store data efficiently with SerDes

Fill tables from queries
Partition Hive tables for optimal queries
Compose HiveQL queries

Distinguish the joins available in Hive
Optimize join structure for performance

Sort, distribute and group data
Reduce query complexity with views
Improve query performance with indexes

Design Hive schemas
Establish data compression
Debug Hive scripts

Unify the data view with HCatalog
Use HCatalog to access the Hive metastore
Communicate via the HCatalog interfaces
Fill in a Hive table from Pig

Break down the fundamental components of Impala
Submit queries to Impala
Access Hive data from Impala

Reduce data access time with Spark-SQL
Query Hive data with Spark-SQL

Sessions

Contact us for more informations about session date