Pentaho’s intuitive and powerful platform is built to tackle these challenges head-on, but delivering accelerated productivity and value-for-time is just the beginning. Pentaho helps teams manage complex data transformations and enables them to operationalize Hadoop as part of an end-to-end data pipeline, ensuring the delivery of governed analytics.
Preview
By the end of this training program from pentaho tutorial, you will learn to:
Intuitive visual interface to integrate and blend Hadoop data with virtually any other source – including relational databases, NoSQL stores, enterprise applications, and more
Ability to design MapReduce jobs 15 times faster than hand-coding approaches
Native MapReduce integration that executes complex transformation and blending logic in-cluster, while scaling linearly with Hadoop
Deep integration with the Hadoop ecosystem that offers real control over YARN jobs, Spark execution, Oozie, Sqoop, and more
Automation to rapidly accelerate the ingestion and onboarding of hundreds or thousands of diverse and changing data sources into Hadoop
Support for leading Hadoop distributions, including Cloudera, Hortonworks, Amazon EMR, and MapR, with maximum portability of jobs and transformations between Hadoop platforms
Course Contents
Day 1
Basic Conceptual Architecture & Architectural Components
What is pentaho
Dimensional modeling
Dimensional design
Star schema-what and by (model)
Star schema –what and by (language)
Conformed dimensions
Additive vs. semi-additive facts
Snow flake schema
Star vs. Dimensional schema
Slowly changing dimensions
Clustering in Pentaho
Files in pentaho
Spoon.bat
Pan.bat
Kitech.bat
Carte.bat
Encr.bat
Pentaho Spoon transformation and its steps of transformation
How to create a connection in DB
How to move a data from CSV file input to table output
How to move the data from CSV file input to the Microsoft excel output
How to move the data from Microsoft excel input to write to log
Data Grid
Generate rows
Data Transformation-Changing the Data from one form to another form
How to add constant
How to add sequence
Add value field changing sequence
How the calculator work in pentaho
No. range in pentaho
Replace in string
Select values-select field value
Select field value to constant
Sort rows
Split field to rows
String operation-string cut
Unique rows
Unique rows (hash set)
Value mapper
Flow
Revision of SSH command
How to handle the null value in pentaho
Mail in pentaho
Error handling in pentaho
Filter rows
Priorities stream
Day 2
Implementation of SCD & Jobs
Revision of SCD type 2
Jobs
Differences between jobs and transformation
How to make ETL dynamic
How to make transformation dynamic
File management
Create folder
Conditions
Scripting
Bulk loading
XML
Utility
Repository
File transfer
File encryption
Types of Repository in Pentaho
How to make ETL dynamic
Difference between parameter and variable
How to pass a variable from a job to transformation
How to use a parameter within a transformation
How to set and get the value from a job to a transformation
Environmental variable
Functionality of repository in pentaho
Data base connection
Repository import
Pentaho Repository & Report Designing
Basic operational report/dashboard
Row bending effect in pentaho
Pentaho Report designing
How to public the report/dashboard
How pentaho server BI looks like
How to create Bar chart, Pi chart and Line chart in pentaho
Limitation of design the product
Sub report/dashboard in pentaho
Adhoc Reporting/Dashboard
How to pass parameter in report/dashboard?
Drill down report/dashboard
How to create report/dashboard using cubes?
How to create report/dashboard using excel sheet?
How to create report/dashboard using Pentaho Data Integration
Day 3
What is Cube
What are the benefit of cube ?
How to create the cube?
How to deploy a cube?
How to create a report/dashboard using a cube?
What is MDX?
MDX basics
MDX in practice
Cells
Tuple implicit added aggregation member
Tuple
Tuple implicit dimensions
Sets in MDX
Selects
Referencing dimensions,level members
Member referencing
Positional
Hierarchical navigation
Hierarchical navigation MISC
Functions
Meta data
Creating Pentaho Analyzer Report
ETL Connectivity with Hadoop Ecosystem
How ETL tools work in Big data Industry
Connecting to HDFS from ETL tool and moving data from Local system to HDFS
Moving Data from DBMS to HDFS
Working with Hive with ETL Tool
Creating Map Reduce job in ETL tool
End to End ETL PoC showing Hadoop integration with ETL tool
Creating dashboards in Pentaho with 5.3.x Version
Project – Pentaho Interactive Report
Data – Sales, Customer, Product
Problem Statement – It describes that how to create an interactive report in Pentaho. For this purpose it includes following action:
How to create Data source
Manage Data source
Formatting the Report
How to change the template of the report
Scheduling etc.
Day 4
Introduction to Hadoop and its Ecosystem, Map Reduce and HDFS
Big Data, Factors constituting Big Data
Hadoop and Hadoop Ecosystem
Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency
Hadoop Distributed File System (HDFS) Concepts and its Importance
Deep Dive in Map Reduce – Execution Framework, Partioner , Combiner, Data Types, Key pairs
HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow
Parallel Copying with DISTCP, Hadoop Archives
Hands on Exercises
1. Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their Properties and Demon Threads
2. Accessing HDFS from Command Line
3. Map Reduce – Basic Exercises
4. Understanding Hadoop Eco-system
1.Introduction to Sqoop, use cases and Installation
2.Introduction to Hive, use cases and Installation
3.Introduction to Pig, use cases and Installation
4.Introduction to Oozie, use cases and Installation
5.Introduction to Flume, use cases and Installation
6.Introduction to Yarn
Deep Dive in Map Reduce
How to develop Map Reduce Application, writing unit test
Best Practices for developing and writing, Debugging Map Reduce applications
Joining Data sets in Map Reduce
Hive
1. Introduction to Hive
What Is Hive?
Hive Schema and Data Storage
Comparing Hive to Traditional Databases
Hive vs. Pig
ive Use Cases
Interacting with Hive
2. Relational Data Analysis with Hive
Hive Databases and Tables
Basic HiveQL Syntax
Data Types
Joining Data Sets
Common Built-in Functions
Hive Data Management
Hive Data Formats
Creating Databases and Hive-Managed Tables
Loading Data into Hive
Altering Databases and Tables
Self-Managed Tables
Simplifying Queries with Views
Storing Query Results
Controlling Access to Data
Hive Optimization
Understanding Query Performance
Partitioning
Bucketing
Indexing Data
Extending Hive
User-Defined Functions
User-defined Functions, Optimizing Queries, Tips and Tricks for performance tuning
Day 5
Pig
1. Introduction to Pig
What Is Pig?
Pig’s Features
Pig Use Cases
Interacting with Pig
2. Basic Data Analysis with Pig
Pig Latin Syntax
Loading Data
Simple Data Types
Field Definitions
Data Output
Viewing the Schema
Filtering and Sorting Data
Commonly-Used Functions
3. Processing Complex Data with Pig
Complex/Nested Data Types
Grouping
Iterating Grouped Data
4. Multi-Dataset Operations with Pig
Techniques for Combining Data Sets
Joining Data Sets in Pig
Set Operations
Splitting Data Sets
5. Extending Pig
Macros and Imports
UDFs
Using Other Languages to Process Data with Pig
Pig Jobs
Impala
Introduction to Impala
What is Impala?
How Impala Differs from Hive and Pig
How Impala Differs from Relational Databases
Limitations and Future Directions
Using the Impala Shell
Choosing the Best (Hive, Pig, Impala)
ETL Connectivity with Hadoop Ecosystem
How ETL tools work in Big data Industry
Connecting to HDFS from ETL tool and moving data from Local system to HDFS
Moving Data from DBMS to HDFS
Working with Hive with ETL Tool
Creating Map Reduce job in ETL tool
End to End ETL PoC showing Hadoop integration with ETL tool.
Job and Certification Support
Major Project, Hadoop Development, cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation