Goals
- Handle complex data sets stored in Hadoop without having to write complex code with Java
- Automate the transfer of data into Hadoop storage with Flume and Sqoop
- Filter data with Extract-Transform-Load (ETL) operations with Pig
- Query multiple datasets for analysis with Pig and Hive
Program
Hadoop Overview
Analyze Hadoop Components
Define Hadoop Architecture
Achieve reliable and secure
storage
Monitor storage metrics Control HDFS from the command line
Detail the MapReduce approach
Transfer the algorithms and not the data
Break down the key steps of a MapReduce task
Facilitate data entry and exit
Aggregate data with Flume
Configure data fan in and fan out
Move relational data with Sqoop
Explain the differences between Pig and MapReduce
Identify Pig use cases
Identify key Pig configurations
Represent the data in the Pig data model
Execute the Pig Latin commands in the Grunt Shell
Express the transformations in the Pig Latin syntax
Call the load and store functions
Create new relationships with joins
Reduce data size by sampling
Exploit Pig and user-defined functions
Consolidate datasets with unions
Partition datasets with splits
Add parameters in Pig scripts
Duration
4 days
Price
£ 2367
Audience
Database technicians and specialists, managers, business analysts and BI professionals who want to use Big Data technologies in their business
Prerequisites
Fundamental knowledge of databases and SQL is a major asset
Reference
BUS100295-F
Factorize Hive into components
Impose structure on data with Hive
Create Hive databases and tables
Expose the differences between data types in Hive
Load and store data efficiently with SerDes
Fill tables from queries
Partition Hive tables for optimal queries
Compose HiveQL queries
Distinguish the joins available in Hive
Optimize join structure for performance
Sort, distribute and group data
Reduce query complexity with views
Improve query performance with indexes
Design Hive schemas
Establish data compression
Debug Hive scripts
Unify the data view with HCatalog
Use HCatalog to access the Hive metastore
Communicate via the HCatalog interfaces
Fill in a Hive table from Pig
Break down the fundamental components of Impala
Submit queries to Impala
Access Hive data from Impala
Reduce data access time with Spark-SQL
Query Hive data with Spark-SQL
Sessions
Contact us for more informations about session date