Hadoop Map Reduce Training

Become a Hadoop Expert by mastering MapReduce, Yarn, Pig, Hive, HBase, Oozie, Flume and Sqoop, while working on industry based Use-cases and Projects.

Preview

By the end of this hadoop certication course, you will learn to:

Master the concepts of HDFS and MapReduce framework
Understand Hadoop 2.x Architecture
Setup Hadoop Cluster and write Complex MapReduce programs
Learn data loading techniques using Sqoop and Flume
Perform data analytics using Pig, Hive and YARN
Implement HBase and MapReduce integration
Implement Advanced Usage and Indexing
Schedule jobs using Oozie
Implement best practices for Hadoop development
Work on a real life Project on Big Data Analytics
Understand Spark and its Ecosystem
Learn how to work in RDD in Spark

Course Contents

Day 1

Introduction

The Motivation for Hadoop
Problems with Traditional Large-Scale Systems
Introducing Hadoop
Hadoopable Problems

Hadoop: Basic Concepts and HDFS

The Hadoop Project and Hadoop Components
The Hadoop Distributed File System

Introduction to MapReduce

MapReduce Overview
Example: WordCount
Mappers
Reducers

Day 2

Hadoop Clusters and the Hadoop Ecosystem

Hadoop Cluster Overview
Hadoop Jobs and Tasks
Other Hadoop Ecosystem Components

Writing a MapReduce Program in Java

Basic MapReduce API Concepts
Writing MapReduce Drivers, Mappers, and Reducers in Java
Speeding Up Hadoop Development by Using Eclipse
Differences Between the Old and New MapReduce APIs

Writing a MapReduce Program Using Streaming

Writing Mappers and Reducers with the Streaming API

Day 3

Unit Testing MapReduce Programs

Unit Testing
The JUnit and MRUnit Testing Frameworks
Writing Unit Tests with MRUnit
Running Unit Tests

Delving Deeper into the Hadoop API

Using the ToolRunner Class
Setting Up and Tearing Down Mappers and Reducers
Decreasing the Amount of Intermediate Data with Combiners
Accessing HDFS Programmatically
Using The Distributed Cache
Using the Hadoop API’s Library of Mappers, Reducers, and Partitioners

Practical Development Tips and Techniques

Strategies for Debugging MapReduce Code
Testing MapReduce Code Locally by Using LocalJobRunner
Writing and Viewing Log Files
Retrieving Job Information with Counters
Reusing Objects
Creating Map-Only MapReduce Jobs

Day 4

Partitioners and Reducers

How Partitioners and Reducers Work Together
Determining the Optimal Number of Reducers for a Job
Writing Customer Partitioners

Data Input and Output

Creating Custom Writable and Writable Comparable Implementations
Saving Binary Data Using SequenceFile and Avro Data Files
Issues to Consider When Using File Compression
Implementing Custom InputFormats and OutputFormats

Common MapReduce Algorithms

Sorting and Searching Large Data Sets
Indexing Data
Computing Term Frequency — Inverse Document Frequency
Calculating Word Co-Occurrence
Performing Secondary Sort

Day 5

Joining Data Sets in MapReduce Jobs

Writing a Map-Side Join
Writing a Reduce-Side Join

Integrating Hadoop into the Enterprise Workflow

Integrating Hadoop into an Existing Enterprise
Loading Data from an RDBMS into HDFS by Using Sqoop
Managing Real-Time Data Using Flume
Accessing HDFS from Legacy Systems with FuseDFS and HttpFS

An Introduction to Hive, Impala, and Pig

The Motivation for Hive, Impala, and Pig
Hive Overview
Impala Overview
Pig Overview
Choosing Between Hive, Impala, and Pig

An Introduction to Oozie

Introduction to Oozie
Creating Oozie Workflows

Enroll

Training Hours

Audience

Analytics professionals
BI /ETL/DW professionals
Project managers
Testing professionals
Mainframe professionals
Software developers and architects
Graduates aiming to build a successful career around Big Data

Apps2Fusion

<< Apr 2024 >>
Mon	Tue	Wed	Thu	Fri	Sat	Sun
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

<< Apr 2024 >>

Mon

Tue

Wed

Thu

Fri

Sat

Sun

Home

Trainings

Fusion Blog

EBS Blog

Authors

CONTACT US

Search Courses