Goals
- Develop applications with Spark
- Use libraries for SQL, data flow and machine learning
- Transcribe difficulties encountered in the field in parallel algorithms
- Develop business applications that integrate with Spark
Program
Defining Big Data and Calculations
What is Spark for
What are the benefits of Spark
Identify the performance limits of modern CPUs
Develop traditional parallel processing models
Use functional programming for the execution of programs in parallel
Transcribe difficulties encountered in the field in parallel algorithms
Distribute the data in the cluster with RDDs (Resilient Distributed Datasets) and DataFrames
Distribute the execution of the tasks between several nodes
Launch the applications with the Spark execution model
Build resilient and fault-resistant clusters
Set up a scalable distributed storage system
Monitoring and Administering Spark Applications
View Execution Plans and Results
Perform exploratory analysis with the Spark shell
Create stand-alone Spark applications
Programming with Scala and other compatible languages
Create applications with basic APIs
Enrich applications with integrated libraries
Duration
4 days
Price
£ 2367
Audience
Developers, system architects and technical managers who want to deploy Spark solutions in their company
Prerequisites
Proficiency in object-oriented programming in Java or C #
Reference
BUS100299-F
Process queries with DataFrames and embedded SQL code
Develop SQL with user-defined functions (UDF)
Use data sets in JSON and Parquet formats
Connect to databases with JDBC
Run Hive queries on external applications
Use sliding windows
Determine the state of a continuous data stream
Process simultaneous data streams
Improve performance and reliability
Process streams from integrated sources (log files, Twitter, Kinesis, Kafka sockets)
Develop custom receivers
Process data with the Streaming API and Spark SQL
Predict outcomes with supervised learning
Create a classification element for the decision tree
Group data with unsupervised learning
Create a cluster with k-means method
Provision of Spark through a RESTful web service
Generate dashboards with Spark
Cloud service vs. on-premises
Choose a service provider (AWS, Azure, Databricks, etc.)
Develop Spark for large clusters
Improve the security of multi-vendor clusters
Monitoring the continuous development of Spark products in the market
Tungsten project: pushing performance to the limit of modern equipment
Use projects developed with Spark
Review the architecture of Spark for mobile platforms
Sessions
Contact us for more informations about session date