NoSQL : Fundamentals - Dawan.training

Goals

- Integrate Big Data compost to create an appropriate Data Lake

- Select suitable Big Data warehouses to manage multiple data sets

- Process large data sets with Hadoop to facilitate technical and business decision making

- Query data sets voluminous in real time

Program

Definition

The four dimensions of Big Data: volume, velocity, variety, veracity Presentation of the MapReduce set, storage and queries

Improve business results with Big Data

Measure the importance of Big Data within a company
Succeed in extracting useful data
Integrate Big Data with traditional data

Analyze the characteristics of your data

Select the data sources to analyze
Remove duplicates
Define the role of NoSQL

Big Data Warehouse Overview

Data models: key value, chart, document, column family Hadoop Distributed File System (HDFS)
HBase
Hive
Cassandra
Hypertable
Amazon S3
BigTable
DynamoDB
MongoDB
Redis
Riak
Neo4J

Choosing a Big Data Warehouse

Choose a data warehouse based on the characteristics of your data
Inject code into the data, implement multilingual data storage solutions
Choose a data warehouse capable of aligning with business objectives

Integrate different data warehouses

Map data with programming framework, connect to data and extract it from storage warehouse, transform data to process
Split data for Hadoop MapReduce

Using Hadoop MapReduce

Create Hadoop MapReduce task components
Distribute data processing between multiple server farms, run Hadoop MapReduce tasks
Monitor progress of task flows

Fundamentals of Hadoop MapReduce

Identify Hadoop Daemons
Examine the Hadoop Distributed File System (HDFS)
Choose Execution Mode: Local, Pseudo-distributed, Fully Distributed

Manage streaming data

Compare real-time processing models
Use Storm to extract live events
Fast processing with Spark and Shark

Synthesize Hadoop MapReduce tasks with Pig

Communicate with Hadoop in Pig Latin
Execute commands with the Grunt shell
Streamline high-level processing

Duration

3 days

Price

£ 1804

Audience

Anyone wishing to take advantage of the many advantages associated with technologies dedicated to Big Data

Prerequisites

Have working knowledge of the Microsoft Windows platform

Programming concepts are useful without being compulsory

Reference

BAS100301-F

Launch ad hoc queries on Big Data with Hive

Ensure data persistence in the Hive MegaStore
Launch queries with HiveQL
Examine the format of Hive files

Extract data that adds value to the business

Analyze data with Mahout, use reporting tools to display the result of processing
Query in real time with Impala

Develop a strategy dedicated to Big Data

Define Big Data needs
Achieve objectives thanks to the relevance of data
Evaluate the various market tools dedicated to Big Data
Meet the expectations of company personnel

An innovative analytical method

Identify the importance of business processes
Identify the problem
Choose the right tools
Obtain exploitable results

Implement a Big Data solution

Choosing the right providers and hosting options
Finding the right balance between the costs incurred and the value provided to the company
Staying ahead

Program

Sessions