bright idea

Goals


- Develop efficient parallel algorithms

- Analyze unstructured files and develop MapReduce Java tasks

- Load and retrieve data from HBase and Hadoop Distributed File System (HDFS)

- User Defined Functions from Hive and Pig

Program

Evaluate the value that Hadoop can bring to the business
Examine the Hadoop ecosystem
Choose a suitable distribution model

Examine the difficulties associated with running parallel programs: algorithms, data exchange
Evaluate the storage mode and complexity of Big Data

Fragment and solve large-scale problems
Discover tasks compatible with MapReduce
Solve common business problems

Configure the development environment
Examine the Hadoop distribution
Study the Hadoop daemons
Create the different components of MapReduce tasks
Analyze the different stages of MapReduce processing: split, map, shuffle and shrink

Choose and use multiple mapping and reduction tools, leverage partitioners and built-in map and reduce functions, analyze time series data with a second sort, streamline tasks in different programming languages

Run algorithms: parallel sorts, joins and searches, analyze log files, social media data and emails

Identify parallel algorithms related to network, processor, and disk I / O
Distribute workload with partitioners
Control grouping and sort order with comparators
Measure performance with counters

Optimize data throughput performance
Use redundancy to recover data

Duration

4 days

Price

£ 2821

Audience

Anyone who will use, administer, or deploy SharePoint in an organization

Anyone who wants to develop or administer SharePoint applications

Prerequisites

Have experience of training level 471, Java Programming: Fundamentals, or more than 6 months of experience in Java programming

Reference

BUS100298-F

Analyze the structure and organization of HDFS
Load raw data and retrieve the result
Read and write data with a program
Manipulate Hadoop’s SequenceFile types
Share reference data with DistributedCache

Switch from structured storage to unstructured storage
Apply NoSQL principles with a template application to read, connect to HBase from MapReduce tasks, compare HBase with other types of NoSQL datastores

Structure databases, tables, views and partitions
Integrate MapReduce jobs with Hive queries
Launch queries with HiveQL
Access Hive servers via IDBC, add functionality to HiveQL with user-defined functions

Develop Pig Latin scripts to consolidate workflows, integrate Pig queries with Java
Interact with data through the Grunt console
Extend Pig with user-defined functions

Record important events to audit and debug
Validate specifications with MRUnit
Debug in local mode

Deploy the solution on a production cluster, use administration tools to optimize performance, monitor task execution via web user interfaces

Sessions

Contact us for more informations about session date