bright idea

Goals


- Develop applications with Spark

- Use libraries for SQL, data flow and machine learning

- Transcribe difficulties encountered in the field in parallel algorithms

- Develop business applications that integrate with Spark

Program

Defining Big Data and Calculations
What is Spark for
What are the benefits of Spark

Identify the performance limits of modern CPUs
Develop traditional parallel processing models

Use functional programming for the execution of programs in parallel
Transcribe difficulties encountered in the field in parallel algorithms

Distribute the data in the cluster with RDDs (Resilient Distributed Datasets) and DataFrames
Distribute the execution of the tasks between several nodes
Launch the applications with the Spark execution model

Build resilient and fault-resistant clusters
Set up a scalable distributed storage system

Monitoring and Administering Spark Applications
View Execution Plans and Results

Perform exploratory analysis with the Spark shell
Create stand-alone Spark applications

Programming with Scala and other compatible languages
Create applications with basic APIs
Enrich applications with integrated libraries

Duration

4 days

Price

£ 2367

Audience

Developers, system architects and technical managers who want to deploy Spark solutions in their company

Prerequisites

Proficiency in object-oriented programming in Java or C #

Reference

BUS100299-F

Process queries with DataFrames and embedded SQL code
Develop SQL with user-defined functions (UDF)
Use data sets in JSON and Parquet formats

Connect to databases with JDBC
Run Hive queries on external applications

Use sliding windows
Determine the state of a continuous data stream
Process simultaneous data streams
Improve performance and reliability

Process streams from integrated sources (log files, Twitter, Kinesis, Kafka sockets)
Develop custom receivers
Process data with the Streaming API and Spark SQL

Predict outcomes with supervised learning
Create a classification element for the decision tree

Group data with unsupervised learning
Create a cluster with k-means method

Provision of Spark through a RESTful web service
Generate dashboards with Spark

Cloud service vs. on-premises
Choose a service provider (AWS, Azure, Databricks, etc.)

Develop Spark for large clusters
Improve the security of multi-vendor clusters
Monitoring the continuous development of Spark products in the market
Tungsten project: pushing performance to the limit of modern equipment
Use projects developed with Spark
Review the architecture of Spark for mobile platforms

Sessions

Contact us for more informations about session date