Hadoop : Deploy Big Data
Duration : 4 days
Goals :
- Understanding Big Data and its challenges
- Knowing how to deploy Hadoop and its ecosystem
- Understanding HDFS, MapReduce
- Structuring data with HBase
- Writing queries with HiveQL
- Running an analysis with PigElasticsearch : Search and analyze in real time
Duration : 3 days
Goals :
- Set up Elasticsearch to index documents
- Understand the Elasticsearch ecosystemHadoop : Install and administer a cluster of nodes
Duration : 3 days
Goals :
- Install the services of a Hadoop node
- Assemble several Hadoop nodes
- Deploy a new application on an existing cluster
- Perform a data restoration following disaster recoveryData Science (R and Hadoop)
Duration : 5 days
Goals :
- Apply data mining techniques to improve business decision making from internal and external data sources
- Get a head start on your competition with structured and unstructured data analysis
- Predict an outcome by using supervised machine learning techniquesPig, Hive and Impala
Duration : 4 days
Goals :
- Handle complex data sets stored in Hadoop without having to write complex code with Java
- Automate the transfer of data into Hadoop storage with Flume and Sqoop
- Filter data with Extract-Transform-Load (ETL) operations with Pig
- Query multiple datasets for analysis with Pig and HiveCassandra
Duration : 3 days
Goals :
- Structure and design Cassandra databases to stay ahead of your competitors
- Apply query models to model the data of your Cassandra databases
- Access Cassandra databases with CQL and Java
- Find the right balance between read / write speed and data consistency
- Integrate Cassandra with Hadoop, Pig and Hive
- Implement the most common Cassandra design patternsProgramming Hadoop in Java
Duration : 4 days
Goals :
- Develop efficient parallel algorithms
- Analyze unstructured files and develop MapReduce Java tasks
- Load and retrieve data from HBase and Hadoop Distributed File System (HDFS)
- User Defined Functions from Hive and Pig
Apache Spark
Duration : 4 days
Goals :
- Develop applications with Spark
- Use libraries for SQL, data flow and machine learning
- Transcribe difficulties encountered in the field in parallel algorithms
- Develop business applications that integrate with SparkELK (Elasticsearch, Logstash and Kibana)
Duration : 3 days
Goals :
- Master the use of Elasticsearch, logstash and Kibana to index, search and visualize data and documentsApache Solr : implementing a search engine
Duration : 3 days
Goals :
- Master the use of Solr to index and search data and documentsApache Kafka : Data exchange
Duration : 3 days
Goals :
- Understand the architecture of Kafka and its use cases
- Use the Kafka APIs
- Administer a cluster
- Build a high-availability architecture
- Secure a cluster