Apache Spark and Scala Training

Apache Spark and Scala Hadoop online training will advance your expertise in Big Data Hadoop Ecosystem. Trainees will get to know about RDDs, different APIs which Spark offers such as Spark Streaming, MLlib, Spark SQL, and GraphX. This course is an integral part of a developer's learning path. A basic understanding of functional programming and object oriented programming will help. Knowledge of Scala will definitely be an added benefit.

Preview

In Apache Spark and Scala hadoop certificaton training, you will be able to :

1.Get clear understanding of the limitations of MapReduce and role of Spark in overcoming these limitations
2.Understand fundamentals of Scala Programming Language and it’s features
3.Expertise in using RDD for creating applications in Spark
4.Mastering SQL queries using SparkSQL

Course Contents

Day 1

Introduction to Big Data and Spark

Overview of BigData and Spark
MapReduce limitations
Spark History
Spark Architecture
Spark and Hadoop Advantages
Benefits of Spark + Hadoop
Introduction to Spark Eco-system
Spark Installation

Introduction to Scala

Scala foundation
Features of Scala
Setup Spark and Scala on Unbuntu and Windows OS
Install IDE's for Scala
Run Scala Codes on Scala Shell
Understanding Data types in Scala
Implementing Lazy Values
Control Structures
Looping Structures
Functions
Procedures
Collections
Arrays and Array Buffers
Map's, Tuples and Lists

Day 2

Object Oriented Programming in Scala

Implementing Classes

Implementing Getter & Setter

Object & Object Private Fields

Implementing Nested Classes

Using Auxilary Constructor

Primary Constructor

Companion Object

Apply Method

Understanding Packages

Override Methods

Type Checking

Casting

Abstract Classes

Day 3

Functional Programming in Scala

Understanding Functional programming in Scala
Implementing Traits
Layered Traits
Rich Traits
Anonymous Functions
Higher Order Functions
Closures and Currying
Performing File Processing

Foundation to Spark

Spark Shell and PySpark
Basic operations on Shell
Spark Java projects
Spark Context and Spark Properties
Persistance in Spark
HDFS data from Spark
Implementing Server Log Analysis using Spark

Day 4

Working with Resilient Distributed DataSets (RDD)

Understanding RDD

Loading data into RDD

Scala RDD, Paired RDD, Double RDD & General RDD Functions

Implementing HadoopRDD, Filtered RDD, Joined RDD

Transformations, Actions and Shared Variables

Spark Operations on hadoop YARN

Sequence File Processing

Partitioner and its role in Performance improvement

Day 5

Spark Eco-system - Spark Streaming & Spark SQL

Introduction to Spark Streaming
Introduction to Spark SQL
Querying Files as Tables
Text file Format
JSON file Format
Parquet file Format
Hive and Spark SQL Architecture
Integrating Spark & Apache Hive
Spark SQL performance optimization
Implementing Data visualization in Spark

Enroll

Training Hours

Audience

1.Professionals aspiring for a career in field of real time Big data analytics
2.Analytics & Research professionals
3.IT developers and testers
4.Data scientists
5.BI and reporting professionals
6.Students who wish to gain a thorough 7.understanding of Apache Spark

Sangeetha

<< Apr 2024 >>
Mon	Tue	Wed	Thu	Fri	Sat	Sun
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

<< Apr 2024 >>

Mon

Tue

Wed

Thu

Fri

Sat

Sun

Home

Trainings

Fusion Blog

EBS Blog

Authors

CONTACT US

Search Courses