Goals
- Develop efficient parallel algorithms
- Analyze unstructured files and develop MapReduce Java tasks
- Load and retrieve data from HBase and Hadoop Distributed File System (HDFS)
- User Defined Functions from Hive and Pig
Program
Evaluate the value that Hadoop can bring to the business
Examine the Hadoop ecosystem
Choose a suitable distribution model
Examine the difficulties associated with running parallel programs: algorithms, data exchange
Evaluate the storage mode and complexity of Big Data
Fragment and solve large-scale problems
Discover tasks compatible with MapReduce
Solve common business problems
Configure the development environment
Examine the Hadoop distribution
Study the Hadoop daemons
Create the different components of MapReduce tasks
Analyze the different stages of MapReduce processing: split, map, shuffle and shrink
Choose and use multiple mapping and reduction tools, leverage partitioners and built-in map and reduce functions, analyze time series data with a second sort, streamline tasks in different programming languages
Run algorithms: parallel sorts, joins and searches, analyze log files, social media data and emails
Identify parallel algorithms related to network, processor, and disk I / O
Distribute workload with partitioners
Control grouping and sort order with comparators
Measure performance with counters
Optimize data throughput performance
Use redundancy to recover data
Duration
4 days
Price
£ 2821
Audience
Anyone who will use, administer, or deploy SharePoint in an organization
Anyone who wants to develop or administer SharePoint applications
Prerequisites
Have experience of training level 471, Java Programming: Fundamentals, or more than 6 months of experience in Java programming
Reference
BUS100298-F
Analyze the structure and organization of HDFS
Load raw data and retrieve the result
Read and write data with a program
Manipulate Hadoop’s SequenceFile types
Share reference data with DistributedCache
Switch from structured storage to unstructured storage
Apply NoSQL principles with a template application to read, connect to HBase from MapReduce tasks, compare HBase with other types of NoSQL datastores
Structure databases, tables, views and partitions
Integrate MapReduce jobs with Hive queries
Launch queries with HiveQL
Access Hive servers via IDBC, add functionality to HiveQL with user-defined functions
Develop Pig Latin scripts to consolidate workflows, integrate Pig queries with Java
Interact with data through the Grunt console
Extend Pig with user-defined functions
Record important events to audit and debug
Validate specifications with MRUnit
Debug in local mode
Deploy the solution on a production cluster, use administration tools to optimize performance, monitor task execution via web user interfaces
Sessions
Contact us for more informations about session date