Pentaho Training for Hadoop

Pentaho’s intuitive and powerful platform is built to tackle these challenges head-on, but delivering accelerated productivity and value-for-time is just the beginning. Pentaho helps teams manage complex data transformations and enables them to operationalize Hadoop as part of an end-to-end data pipeline, ensuring the delivery of governed analytics.

Preview

By the end of this training program from pentaho tutorial, you will learn to:

Intuitive visual interface to integrate and blend Hadoop data with virtually any other source – including relational databases, NoSQL stores, enterprise applications, and more
Ability to design MapReduce jobs 15 times faster than hand-coding approaches
Native MapReduce integration that executes complex transformation and blending logic in-cluster, while scaling linearly with Hadoop
Deep integration with the Hadoop ecosystem that offers real control over YARN jobs, Spark execution, Oozie, Sqoop, and more
Automation to rapidly accelerate the ingestion and onboarding of hundreds or thousands of diverse and changing data sources into Hadoop
Support for leading Hadoop distributions, including Cloudera, Hortonworks, Amazon EMR, and MapR, with maximum portability of jobs and transformations between Hadoop platforms

Course Contents

Day 1

Basic Conceptual Architecture & Architectural Components

What is pentaho
Dimensional modeling
Dimensional design
Star schema-what and by (model)
Star schema –what and by (language)
Conformed dimensions
Additive vs. semi-additive facts
Snow flake schema
Star vs. Dimensional schema
Slowly changing dimensions

Clustering in Pentaho

Files in pentaho
Spoon.bat
Pan.bat
Kitech.bat
Carte.bat
Encr.bat
Pentaho Spoon transformation and its steps of transformation
How to create a connection in DB
How to move a data from CSV file input to table output
How to move the data from CSV file input to the Microsoft excel output
How to move the data from Microsoft excel input to write to log
Data Grid
Generate rows

Data Transformation-Changing the Data from one form to another form

How to add constant
How to add sequence
Add value field changing sequence
How the calculator work in pentaho
No. range in pentaho
Replace in string
Select values-select field value
Select field value to constant
Sort rows
Split field to rows
String operation-string cut
Unique rows
Unique rows (hash set)
Value mapper

Flow

Revision of SSH command
How to handle the null value in pentaho
Mail in pentaho
Error handling in pentaho
Filter rows
Priorities stream

Day 2

Implementation of SCD & Jobs

Revision of SCD type 2
Jobs
Differences between jobs and transformation
How to make ETL dynamic
How to make transformation dynamic
File management
Create folder
Conditions
Scripting
Bulk loading
XML
Utility
Repository
File transfer
File encryption

Types of Repository in Pentaho

How to make ETL dynamic
Difference between parameter and variable
How to pass a variable from a job to transformation
How to use a parameter within a transformation
How to set and get the value from a job to a transformation
Environmental variable
Functionality of repository in pentaho
Data base connection
Repository import

Pentaho Repository & Report Designing

Basic operational report/dashboard
Row bending effect in pentaho
Pentaho Report designing
How to public the report/dashboard
How pentaho server BI looks like
How to create Bar chart, Pi chart and Line chart in pentaho
Limitation of design the product
Sub report/dashboard in pentaho

Adhoc Reporting/Dashboard

How to pass parameter in report/dashboard?
Drill down report/dashboard
How to create report/dashboard using cubes?
How to create report/dashboard using excel sheet?
How to create report/dashboard using Pentaho Data Integration

Day 3

What is Cube

What are the benefit of cube ?
How to create the cube?
How to deploy a cube?
How to create a report/dashboard using a cube?

What is MDX?

MDX basics
MDX in practice
Cells
Tuple implicit added aggregation member
Tuple
Tuple implicit dimensions
Sets in MDX
Selects
Referencing dimensions,level members
Member referencing
Positional
Hierarchical navigation
Hierarchical navigation MISC
Functions
Meta data

Creating Pentaho Analyzer Report

ETL Connectivity with Hadoop Ecosystem

How ETL tools work in Big data Industry
Connecting to HDFS from ETL tool and moving data from Local system to HDFS
Moving Data from DBMS to HDFS
Working with Hive with ETL Tool
Creating Map Reduce job in ETL tool
End to End ETL PoC showing Hadoop integration with ETL tool

Creating dashboards in Pentaho with 5.3.x Version

Project – Pentaho Interactive Report

Data – Sales, Customer, Product
Problem Statement – It describes that how to create an interactive report in Pentaho. For this purpose it includes following action:
How to create Data source
Manage Data source
Formatting the Report
How to change the template of the report
Scheduling etc.

Day 4

Introduction to Hadoop and its Ecosystem, Map Reduce and HDFS

Big Data, Factors constituting Big Data
Hadoop and Hadoop Ecosystem
Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency
Hadoop Distributed File System (HDFS) Concepts and its Importance
Deep Dive in Map Reduce – Execution Framework, Partioner , Combiner, Data Types, Key pairs
HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow
Parallel Copying with DISTCP, Hadoop Archives

Hands on Exercises

1. Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their Properties and Demon Threads
2. Accessing HDFS from Command Line
3. Map Reduce – Basic Exercises
4. Understanding Hadoop Eco-system
1.Introduction to Sqoop, use cases and Installation
2.Introduction to Hive, use cases and Installation
3.Introduction to Pig, use cases and Installation
4.Introduction to Oozie, use cases and Installation
5.Introduction to Flume, use cases and Installation
6.Introduction to Yarn

Deep Dive in Map Reduce

How to develop Map Reduce Application, writing unit test
Best Practices for developing and writing, Debugging Map Reduce applications
Joining Data sets in Map Reduce

Hive

1. Introduction to Hive

What Is Hive?
Hive Schema and Data Storage
Comparing Hive to Traditional Databases
Hive vs. Pig
ive Use Cases
Interacting with Hive

2. Relational Data Analysis with Hive

Hive Databases and Tables
Basic HiveQL Syntax
Data Types
Joining Data Sets
Common Built-in Functions

Hive Data Management

Hive Data Formats
Creating Databases and Hive-Managed Tables
Loading Data into Hive
Altering Databases and Tables
Self-Managed Tables
Simplifying Queries with Views
Storing Query Results
Controlling Access to Data

Hive Optimization

Understanding Query Performance
Partitioning
Bucketing
Indexing Data

Extending Hive

User-Defined Functions

User-defined Functions, Optimizing Queries, Tips and Tricks for performance tuning

Day 5

Pig

1. Introduction to Pig

What Is Pig?

Pig’s Features

Pig Use Cases

Interacting with Pig

2. Basic Data Analysis with Pig

Pig Latin Syntax

Loading Data

Simple Data Types

Field Definitions

Data Output

Viewing the Schema

Filtering and Sorting Data

Commonly-Used Functions

3. Processing Complex Data with Pig

Complex/Nested Data Types

Grouping

Iterating Grouped Data

4. Multi-Dataset Operations with Pig

Techniques for Combining Data Sets

Joining Data Sets in Pig

Set Operations

Splitting Data Sets

5. Extending Pig

Macros and Imports

UDFs

Using Other Languages to Process Data with Pig

Pig Jobs

Impala

Introduction to Impala

What is Impala?

How Impala Differs from Hive and Pig

How Impala Differs from Relational Databases

Limitations and Future Directions

Using the Impala Shell

Choosing the Best (Hive, Pig, Impala)

ETL Connectivity with Hadoop Ecosystem

How ETL tools work in Big data Industry

Connecting to HDFS from ETL tool and moving data from Local system to HDFS

Moving Data from DBMS to HDFS

Working with Hive with ETL Tool

Creating Map Reduce job in ETL tool

End to End ETL PoC showing Hadoop integration with ETL tool.

Job and Certification Support

Major Project, Hadoop Development, cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation

Enroll

Training Hours

Audience

For system administrators, system integrators, Business Analyst, BI developer, Data Scientist, Solution Architect, software developers, and others with previous experience creating BI solutions.
Professionals aspiring to make a career in Big Data Analytics using Hadoop Framework.

Mon	Tue	Wed	Thu	Fri	Sat	Sun
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Home

Trainings

Fusion Blog

EBS Blog

Authors

CONTACT US

Search Courses