Apache Spark and Scala Hadoop online training will advance your expertise in Big Data Hadoop Ecosystem. Trainees will get to know about RDDs, different APIs which Spark offers such as Spark Streaming, MLlib, Spark SQL, and GraphX. This course is an integral part of a developer's learning path. A basic understanding of functional programming and object oriented programming will help. Knowledge of Scala will definitely be an added benefit.
Preview
In Apache Spark and Scala hadoop certificaton training, you will be able to :
1.Get clear understanding of the limitations of MapReduce and role of Spark in overcoming these limitations
2.Understand fundamentals of Scala Programming Language and it’s features
3.Expertise in using RDD for creating applications in Spark
4.Mastering SQL queries using SparkSQL
Course Contents
Day 1
Introduction to Big Data and Spark
Overview of BigData and Spark
MapReduce limitations
Spark History
Spark Architecture
Spark and Hadoop Advantages
Benefits of Spark + Hadoop
Introduction to Spark Eco-system
Spark Installation
Introduction to Scala
Scala foundation
Features of Scala
Setup Spark and Scala on Unbuntu and Windows OS
Install IDE's for Scala
Run Scala Codes on Scala Shell
Understanding Data types in Scala
Implementing Lazy Values
Control Structures
Looping Structures
Functions
Procedures
Collections
Arrays and Array Buffers
Map's, Tuples and Lists
Day 2
Object Oriented Programming in Scala
Implementing Classes
Implementing Getter & Setter
Object & Object Private Fields
Implementing Nested Classes
Using Auxilary Constructor
Primary Constructor
Companion Object
Apply Method
Understanding Packages
Override Methods
Type Checking
Casting
Abstract Classes
Day 3
Functional Programming in Scala
Understanding Functional programming in Scala
Implementing Traits
Layered Traits
Rich Traits
Anonymous Functions
Higher Order Functions
Closures and Currying
Performing File Processing
Foundation to Spark
Spark Shell and PySpark
Basic operations on Shell
Spark Java projects
Spark Context and Spark Properties
Persistance in Spark
HDFS data from Spark
Implementing Server Log Analysis using Spark
Day 4
Working with Resilient Distributed DataSets (RDD)
Understanding RDD
Loading data into RDD
Scala RDD, Paired RDD, Double RDD & General RDD Functions
Implementing HadoopRDD, Filtered RDD, Joined RDD
Transformations, Actions and Shared Variables
Spark Operations on hadoop YARN
Sequence File Processing
Partitioner and its role in Performance improvement
Day 5
Spark Eco-system - Spark Streaming & Spark SQL
Introduction to Spark Streaming
Introduction to Spark SQL
Querying Files as Tables
Text file Format
JSON file Format
Parquet file Format
Hive and Spark SQL Architecture
Integrating Spark & Apache Hive
Spark SQL performance optimization
Implementing Data visualization in Spark