3.75 out of 5
3.75
44 reviews on Udemy

Building a Data Mart with Pentaho Data Integration

A step-by-step tutorial that takes you through the creation of an ETL process to populate a Kimball-style star schema
Instructor:
Packt Publishing
435 students enrolled
English [Auto-generated]
Create a star schema
Populate and maintain slowly changing dimensions type 1 and type 2
Load fact and dimension tables in an efficient manner
Use a columnar database to store the data for the star schema
Analyze the quality of the data in an agile manner
Implement logging and scheduling for the ETL process
Get an overview of the whole process: from source data to the end user analyzing the data
Learn how to auto-generate data for a date dimension

Companies store a lot of data, but in most cases, it is not available in a format that makes it easily accessible for analysis and reporting tools. Ralph Kimball realized this a long time ago, so he paved the way for the star schema.

Learning Pentaho Data Integration walks you through the creation of an ETL process to create a data mart based on a fictional company. This course will show you how to source the raw data and prepare it for the star schema step-by-step. The practical approach of this course will get you up and running quickly, and will explain the key concepts in an easy to understand manner.

Learning Pentaho Data Integration teaches you how to source raw data with Pentaho Kettle and transform it so that the output can be a Kimball-style star schema. After sourcing the raw data with our ETL process, you will quality check the data using an agile approach. Next, you will learn how to load slowly changing dimensions and the fact table. The star schema will reside in the column-oriented database, so you will learn about bulk-loading the data whenever possible. You will also learn how to create an OLAP schema and analyze the output of your ETL process easily.
By covering all the essential topics in a hands-down approach, you will be in the position of creating your own ETL processes within a short span of time.

Getting Started

1
The Second-hand Lens Store Sample Data

Get an insight into the raw data, which we will be working with in this video tutorial.

2
The Derived Star Schema

Create a Star Schema derived from the raw data.

3
Setting up Our Development Environment

We will create the required databases for our project, add JDBC drivers to PDI and create JDNI connections.

Agile BI – Creating ETLs to Prepare Joined Data Set

1
Importing Raw Data

Create an ETL transformation that imports your raw data so that you can apply further manipulation further down the stream and output the data to the Datamart.

2
Exporting Data Using the Standard Table Output Step

We will learn how to easily make sure that the data types of the ETL output step are in sync with the database table column types.

3
Exporting Data Using the Dedicated Bulk Loading Step

Loading huge amounts of data in the traditional way takes too long ,speed it up by using the bulk loader.

Agile BI – Building OLAP Schema, Analyzing Data, and Implementing Required ETL I

1
Creating a Pentaho Analysis Model

In this first step to Agile ETL development, you will learn how to create a Pentaho Analysis Model so that you can analyze the data later on in Pentaho Analyzer.

2
Analyzing the Data Using the Pentaho Analyzer

A very important point is to understand the quality of the data: Are there any duplicates, misspellings and so on. We will find such problems and use this new knowledge to feed it back to the ETL design.

3
Improving Your ETL for Better Data Quality

Learn how to implement ETL improvements to iron out the data problems found.

Slowly Changing Dimensions

1
Creating a Slowly Changing Dimension of Type 1 Using the Insert/Update Step

Learn how to populate a simple dimension.

2
Creating a Slowly Changing Dimension of Type 1 Using Dimension Lookup Update Ste

Learn how to populate a simple dimension and make it future proof.

3
Creating a Slowly Changing Dimension Type 2

Learn how to keep historic versions in your dimension table

Populating Data Dimension

1
Defining Start and End date Parameters

In order to make our date dimension transformation more dynamic, we will allow users to define a start and end date in order to specify the period.

2
Auto-generating Daily rows for a Given Date Period

Based on the provided parameters, the amount of days between the start and end date will be calculated. This figure will be used to generate a data set with the same number of rows.

3
Auto-generating Year, Month, and Day

In this part, you will learn how to derive various date attributes such as year, week, day and so on. from the input date.

Creating the Fact Transformation

1
Sourcing Raw Data for Fact Table

Learn how to efficiently create an input query for your fact transformation.

2
Look up Slowly Changing Dimension of the Type 1 Key

Learn how to configure the step to look up the SCD type 1 keys.

3
Lookup Slowly Changing Dimension of the Type 2 key

Learn how to configure the step to look up the SCD type 2 keys.

Orchestration

1
Loading Dimensions in Parallel

In our setup, dimensions can be loaded in parallel; therefore, we can create an ETL job

2
Creating Master Jobs

We will create the main job, which runs all the required child jobs and transformations

ID-based Change Data Capture

1
Implementing Change Data Capture (CDC)

In this section, you will learn how the new data can be automatically loaded into Datamart using the Change Data Capture (CDC) approach.

2
Creating a CDC Job Flow

We will define the order of execution for all the transformations involved.

Final Touches: Logging and Scheduling

1
Setting up a Dedicated DB Schema

We will create a dedicated environment for logging.

2
Setting up Built-in Logging

Pentaho Kettle features built-in logging. You will learn how to configure them.

3
Scheduling on the Command Line

Learn how to schedule a daily run of your ETL process.

You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
3.8
3.8 out of 5
44 Ratings

Detailed Rating

Stars 5
8
Stars 4
17
Stars 3
17
Stars 2
1
Stars 1
2
447e29724ce1918504c11a78ca099e2d
30-Day Money-Back Guarantee

Includes

2 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion