4.1 out of 5
4.1
795 reviews on Udemy

ETL Testing: From Beginner to Expert

ETL Testing: Essential course for all software testing professionals.
Instructor:
Sid Inf
4,417 students enrolled
English [Auto-generated]
Understand the concepts of Business Intelligence Data warehousing
Get to know what is ETL Testing, QA Lifecycle and RDBMS Concepts
Gain an in-depth understanding of Data Warehouse WorkFlow and comparison between Database Testing and Data Warehouse Testing
Understand different ETL Testing scenarios like Constraint Testing, Source to Target Testing, Business Rules Testing, Negative Scenarios Testing, Dependency Testing, Error Handling Testing
Perform data Checks using SQL and understand the scope of BI Testing

DW/BI/ETL Testing Training Course is designed for both entry-level and advanced Programmers. The course includes topics related to the foundation of  Data Warehouse with the concepts, Dimensional Modeling and important aspects of Dimensions, Facts and Slowly Changing Dimensions along with the DW/BI/ETL set up,  Database Testing Vs Data Warehouse Testing, Data Warehouse Workflow and Case Study, Data Checks using SQL, Scope of BI testing and as a bonus you will also get the steps to set up the environment with the most popular ETL tool Informatica to perform all the activities on your personal computer to get first hand practical knowledge. 

Welcome! Thank you for learning the ETL Testing Course with me!

1
Welcome! Thank you for learning the ETL Testing Course with me!

In this lecture we talk about the layout of the course and what is covered and how to get the best out of this course. 

Before we start

1
What is Data Testing/ETL Testing and the Challenges in ETL testing?

ETL is commonly associated with Data Warehousing projects but there in reality any form of bulk data movement from a source to a target can be considered ETL.  ETL testing is a data centric testing process to validate that the data has been transformed and loaded into the target as expected.


In this lecture we also talk about data testing and challenges in ETL testing. 


2
What will be the future of ETL tools/testing as big data is getting popular?

This is one of the common questions which is asked by most of the non-Java/Big Data IT professionals about their current technologies and the future of it. 

Especially, when it comes to the ETL or the DW world, the future would be better than ever since "Big Data" would help increase the requirement of better processing of data & these tools excel in doing that.

The Basics - Data warehouse Concepts course

1
Master Data Warehouse Concepts, Step by Step from Scratch
2
Is Data Warehouse still relevant in the age of Big Data?

The original intent of the data warehouse was to segregate analytical operations from mainframe transaction processing in order to avoid slowdowns in transaction response times, and minimize the increased CPU costs accrued by running ad hoc queries and creating and distributing reports. Over time, the enterprise data warehouse became a core component of information architectures, and it's now rare to find a mature business that doesn't employ some form of an EDW or a collection of smaller data marts to support business intelligence, reporting and analytics applications.

In this lecture we see what will be the future of Data warehouse in the age of Big Data. 

3
What is Data?

Data is a collection of raw material in unorganized format. which refers an object. 

4
Why Data Warehouse is implemented?

The concept of data warehousing is not hard to understand. The notion is to create a permanent storage space for the data needed to support reporting, analysis, and other BI functions. In this lecture we understand what are the main reasons behind creating a data warehouse and the benefits of it. 

This long list of benefits is what makes data warehousing an essential management tool for businesses that have reached a certain level of complexity.

5
What is a Data Warehouse?

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

In addition to a relational database, a data warehouse environment includes an extraction, transportation, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.

6
Test your understanding on the Data Warehouse basics

Test your understanding on the Data Warehouse basics

Data Mart

1
What is a Data Mart?

The data mart is a subset of the data warehouse that is usually oriented to a specific business line or team. Data marts are small slices of the data warehouse. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department.

2
Fundamental difference between Data Warehouse and Data Mart

Data Warehouse:

  • Holds multiple subject areas
  • Holds very detailed information
  • Works to integrate all data sources
  • Does not necessarily use a dimensional model but feeds dimensional models.

Data Mart:

  • Often holds only one subject area- for example, Finance, or Sales
  • May hold more summarized data (although many hold full detail)
  • Concentrates on integrating information from a given subject area or set of source systems
  • Is built focused on a dimensional model using a star schema.
3
Advantages of a Data Mart

The primary advantages are:

  • Data Segregation: Each box of information is developed without changing the other ones. This boosts information security and the quality of data.
  • Easier Access to Information: These data structures provide a easiest way of interpret the information stored on the database
  • Faster Response: Derived from the adopted structure
  • Simple queries: Based on the structure and size of the data
  • Subject full detailed data: Might also provide summarization of the information
  • Specific to User Needs: This set of data is focused on the end user needs
  • Easy to Create and Mantain
4
Characteristics of a Data Mart
  • Easy access to frequently needed data
  • Creates collective view by a group of users
  • Improves end-user response time
  • Ease of creation
  • Lower cost than implementing a full data warehouse
  • Potential users are more clearly defined than in a full data warehouse
  • Contains only business essential data and is less cluttered.
5
Disadvantages of a Data Mart

Disadvantages of Data Marts are discussed in this lecture. 

6
Mistakes and Misconceptions of a Data Mart

This lecture talks about the mistakes and the mis-conceptions one have with regard to the Data warehouse. 

7
Test your understanding on the Data Mart Concepts

Test your understanding on the Data Mart Concepts

Data Warehouse Architectures

1
Revised: Enterprise Architecture or Centralized Architecture

In this lecture we see how the Centralized architecture is set up, in which there exists only one data warehouse which stores all data necessary for the business analysis

2
Revised: Federated Data Warehouse Architecture

In a Federated Architecture the data is logically consolidated but stored in separate physical database, at the same or at different physical sites. The local data marts store only the relevant information for a department. 

The amount of data is reduced in contrast to a central data warehouse. The level of detail is enhanced in this kind of model. 

3
Multi Tired Architecture

Multi Tired architecture is a distributed data approach. This process cannot be done in a one step because many sources have to be integrated into a warehouse.

4
Components of a Data Warehouse Architecture

Different data warehousing systems have different structures. Some may have an ODS (operational data store), while some may have multiple data marts. Some may have a small number of data sources, while some may have dozens of data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture rather than discussing the specifics of any one system.

In general, all data warehouse systems have the following layers:

  • Data Source Layer
  • Data Extraction Layer
  • Staging Area
  • ETL Layer
  • Data Storage Layer
  • Data Logic Layer
  • Data Presentation Layer
  • Metadata Layer
  • System Operations Layer
5
Purpose of a Staging Area in Data Warehouse Architecture - Part 1

This is where data is stored prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration. Based on the business architecture and design there can be more than one staging area which can be termed with different naming conventions. 

6
Purpose of a Staging Area in Data Warehouse Architecture - Part 2

This is where data is stored prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration. Based on the business architecture and design there can be more than one staging area which can be termed with different naming conventions. 

7
Test your understanding on the Data Warehouse Architecutre

Test your understanding on the Data Warehouse Architecutre

Dimensional Modeling

1
What is Data Modeling?

Data modeling is the formalization and documentation of existing processes and events that occur during application software design and development. 

2
Why should a Tester know Data Modeling?

The below aspects will be discussed in this lecture. 


•Functional and Technical Aspects

•Completeness in the design

•Understanding DB Test Execution

•Validation

3
Data Modeling Techniques

Data modeling techniques and tools capture and translate complex system designs into easily understood representations of the data flows and processes, creating a blueprint for construction and/or re-engineering. 

4
ER Data Model

An entity–relationship model (ER model) is a data model for describing the data or information aspects of a business domain or its process requirements, in an abstract way that lends itself to ultimately being implemented in a database such as a relational database.

5
Dimensional Model

Dimensional Model is a database structure that is optimized for online queries and Data Warehousing tools. It is comprised of "fact" and "dimension" tables. A "fact" is a numeric value that a business wishes to count or sum. A "dimension" is essentially an entry point for getting at the facts.

6
Differences between ER Model and Dimensional Model

In this lecture we talk about the differences between ER model and the Dimensional Model.

How to build a Dimensional Model?

1
Different phases required to build a Dimensional Data Model

To build  a Dimensional Model we need to follow five different phases

  • Gathering Business Requirements
  • Conceptual Data Model
  • Logical Data Model
  • Physical Data Model
  • Database Implenetation
2
Business Requirements

Data Modelers have to interact with business analysts to get the functional requirements and with end users to find out the reporting needs. 

3
CDM - Conceptual Data Model

This model includes all major entities, relationships. But, this will not contain much detail about attributes and is often used in the initial planning phase. 

4
LDM - Logical Data Model

In this phase the actual implementation of  a conceptual model in a logical data model will happen. A logical data model is the version of the model that represents all of the business requirements of an organization.

5
Physical Data Model

This is a complete model that includes all required tables, columns, relationships, database properties for the physical implementation of the database. 

6
Database

DBA's or ETL developers prepare the scripts to create the entities, attributes and their relationships.


In this lecture we also talk about the reusable database script creation process which can be reused for multiple times.  

7
Test your understanding

On how to create a Dimensional Data Model

Various Objects in a Dimensional Model

1
What is a Dimension Table?

dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. Commonly used dimensions are people, products, place and time. In a data warehousedimensions provide structured labeling information to otherwise un-ordered numeric measures.

2
What is a Fact Table?

In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the center of a star schema, surrounded by dimension tables.

There are four types of facts. 

  • Additive - Measures that can be added across all dimensions.
  • Non Additive - Measures that cannot be added across all dimensions.
  • Semi Additive – Measures that can be added across few dimensions and not with others.
  • Fact less fact tables – The fact table does not have aggregate numeric values or information.
3
Additive Facts

There are four types of facts. 

  • Additive - Measures that can be added across all dimensions.
  • Non Additive - Measures that cannot be added across all dimensions.
  • Semi Additive – Measures that can be added across few dimensions and not with others.
  • Fact less fact tables – The fact table does not have aggregate numeric values or information.
4
Semi Additive Facts

The numeric measures in a fact table fall into three categories. The most flexible and useful facts are fully additive; additive measures can be summed across any of the dimensions associated with the fact table. Semi-additive measures can be summed across some dimensions, but not all; balance amounts are common semi-additive facts because they are additive across all dimensions except time.

5
Non-Additive Facts

There are four types of facts. 

  • Additive - Measures that can be added across all dimensions.
  • Non Additive - Measures that cannot be added across all dimensions.
  • Semi Additive – Measures that can be added across few dimensions and not with others.
  • Fact less fact tables – The fact table does not have aggregate numeric values or information.
6
Fact less Facts

There are four types of facts. 

  • Additive - Measures that can be added across all dimensions.
  • Non Additive - Measures that cannot be added across all dimensions.
  • Semi Additive – Measures that can be added across few dimensions and not with others.
  • Fact less fact tables – The fact table does not have aggregate numeric values or information.
7
What is a Star Schema?

A star schema is the simplest form of a dimensional model, in which data is organized into facts and dimensions

8
What is a Snow Flake Schema?

 The snowflake schema is diagrammed with each fact surrounded by its associated dimensions (as in a star schema), and those dimensions are further related to other dimensions, branching out into a snowflake pattern.

9
Galaxy Schema or Fact Constellation Schema

Galaxy schema also know as fact constellation schema because it is the combination of both of star schema and Snow flake schema.

10
Snow Flake Vs Star Schema

When choosing a database schema for a data warehouse, snowflake and star schema tend to be popular choices. This comparison discusses suitability of star vs. snowflake schema in different scenarios and their characteristics.

11
Conformed Dimensions

conformed dimension is a dimension that has exactly the same meaning and content when being referred from different fact tables. A conformed dimension can refer to multiple tables in multiple data marts within the same organization.

12
Junk Dimensions

 In a Junk dimension, we combine these indicator fields into a single dimension. This way, we'll only need to build a single dimension table, and the number of fields in the fact table, as well as the size of the fact table, can be decreased.

13
Degenerate Dimensions

According to Ralph Kimball, in a data warehouse, a degenerate dimension is a dimension key in the fact table that does not have its own dimension table, because all the interesting attributes have been placed in analytic dimensions. The term "degenerate dimension" was originated by Ralph Kimball. 

14
Role Playing Dimensions

A single physical dimension can be referenced multiple times in a fact table, with each reference linking to a logically distinct role for the dimension. For instance, a fact table can have several dates, each of which is represented by a foreign key to the date dimension.

15
Slowly Changing Dimensions - Intro and Example Creation

Slowly Changing Dimensions (SCD) - dimensions that change slowly over time, rather than changing on regular schedule, time-base.

16
Slowly Changing Dimensions (SCD) Type 1, 2, 3

There are many approaches how to deal with SCD. The most popular are: 

  • Type 0 - The passive method
  • Type 1 - Overwriting the old value
  • Type 2 - Creating a new additional record
  • Type 3 - Adding a new column
  • Type 4 - Using historical table
  • Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)
17
Slowly Changing Dimensions - Summary

Dimension, Fact and SCD Type 1, 2 and 3 are reviewed in this lecture. 

18
Test your understanding on Dimensional Model Objects

Test your understanding on Dimensional Model Objects

Data Integration and ETL

1
What is Data Integration?

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution delivers trusted data from a variety of sources.

2
What is ETL?

ETL is short for extract, transform, load, three database functions that are combined into one tool to pull data out of one database and place it into another database.

Extract is the process of reading data from a database.

Transform is the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data.

Load is the process of writing the data into the target database.

ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert databases from one format or type to another.

3
Data Acquisition

The process of extracting the data from different source (operational databases) systems, integrating the data and transforming the data into a homogeneous format and loading into the target warehouse database. Simple called as ETL (Extraction, Transformation and Loading). The Data Acquisition process designs are called in different manners by different ETL vendors.

4
Data Transformation

Data transformation is the process of converting data or information from one format to another, usually from the format of a source system into the required format of a new destination system.

5
Common Questions and Summary

In this lecture we discuss on what are the common questions which are raised for Data Integration and ETL.

6
Test your understanding on Data Integration and ETL

Test your understanding on Data Integration and ETL

ETL Vs ELT

1
ETL - Explained

ETL is short for extract, transform, load, three database functions that are combined into one tool to pull data out of one database and place it into another database.

Extract is the process of reading data from a database.

Transform is the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data.

Load is the process of writing the data into the target database.

ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert databases from one format or type to another.

2
ELT - Explained

ELT is a variation of the Extract, Transform, Load (ETL), a data integration process in which transformation takes place on an intermediate server before it is loaded into the target.

3
ETL Vs ELT

ELT makes sense when the target is a high-end data engine, such as a data appliance, Hadoop cluster, or cloud installation to name three examples.  If this power is there, why not use it?

ETL, on the other hand, is designed using a pipeline approach. While data is flowing from the source to the target, a transformation engine (something unique to the tool) takes care of any data changes.

Which is better depends on priorities. All things being equal, it’s better to have fewer moving parts. ELT has no transformation engine – the work is done by the target system, which is already there and probably being used for other development work. On the other hand, the ETL approach can provide drastically better performance in certain scenarios. The training and development costs of ETL need to be weighed against the need for better performance. (Additionally, if you don’t have a target system powerful enough for ELT, ETL may be more economical.)

Typical Roles In DWH Project

1
Project Sponsor

Project sponsorship is an active senior management role, responsible for identifying the business need, problem or opportunity. The sponsor ensures the project remains a viable proposition and that benefits are realized, resolving any issues outside the control of the project manager.

2
Project Manager

This person will oversee the progress and be responsible for the success of the data warehousing project.

3
Functional Analyst or Business Analyst

The role of the business analyst is to perform research and possess knowledge of existing business applications and processes to assist in identification of potential data sources, business rules being applied to data as it is captured by and moved through the transaction processing applications, etc. Whenever possible, this role should be filled by someone who has extensive prior experience with a broad range of the organization's business applications. 

4
SME - Subject Matter Expert

subject-matter expert (SME) or domain expert is a person who is an authority in a particular area or topic. The term domain expert is frequently used in expert systems software development, and there the term always refers to the domain other than the software domain.

5
DW BI Architect and Data Modeler

Data Warehouse Architect: These job responsibilities encompass definition of overall data warehouse architectures and standards, definition of data models for the data warehouse and all data marts, evaluation and selection of infrastructure components including hardware, DBMS, networking facilities, ETL (extract, transform and load) software, performing applications design and related tasks.

Data Modeler: The person(s) in this role prepares data models for the source systems based on information provided by the business and/or data analysts. Additionally, the data modeler may assist with the development of the data model (s) for the EDW or a data mart guided by the data warehouse architect. This individual may also assist in the development of business process models, etc.

6
DWH Tech Admin

This position is responsible for maintaining hardware reliability, system level security, system level performance monitoring and tuning, and automation of production activities including extract and load functions, repetitively produced queries/reports, etc. The duties include the setup of user IDs and system access roles for each person or group which is given access to the data warehouse or data mart and monitoring the file system for space availability. In many cases, the system administrator is responsible for ensuring that appropriate disaster recovery functions such as system level backups are performed correctly and on an accepted schedule.

7
ETL Developers

The person or persons functioning within this role will need a substantial understanding of the data warehouse design, load function, etc. Potentially the DW developer may also be required to have some knowledge of the tools and programs used to extract data from the source systems and perform maintenance on those applications. Additionally the ETL Developer may be required to be knowledgeable in the data access tools and perform some data access function development.

8
BI OLAP Developers

In this lecture, we talk about the roles of the reporting team members who create static dashboards and reporting structures.

9
ETL Testers / QA Group
This role is responsible for ensuring the correctness of the data in the data warehouse. This role is more important than it appears, because bad data quality turns away users more than any other reason, and often is the start of the downfall for the data warehousing project.
10
DB UNIX Network Admins

 If your project is large enough to require dedicated resources for system administration and database administrators (DBAs), it is possible you will want a person who will provide leadership and direction for these efforts. This would be someone who is familiar with the hardware and software likely to be used, experienced in administration of these areas and who can direct tuning and optimization efforts as warehouse development and use moves forward in the organization. Including the infrastructure team within the large data warehousing group helps ensure that the needed resources are available as needed to ensure that the project stays on track and within budget.

11
Data Architect, Data Warehouse Architect, BI Architect and Solution Architect

data architect is a practitioner of data architecture, an information technology discipline concerned with designing, creating, deploying and managing an organization's data architecture.

data warehouse architect does a lot more than just data modelling. They also are responsible for the data architecture, ETL, database platform and physical infrastructure.

business intelligence architect (BI architect) is a top-level business intelligence analyst who deals with specific aspects of business intelligence, a discipline that uses data in certain ways and builds specific architectures to benefit a business or organization. The business intelligence architect will generally be responsible for creating or working with these architectures, which serve the specific purpose of maximizing the potential of data assets.

Systems architects define the architecture of a computerized system (i.e., a system composed of software and hardware) in order to fulfill certain requirements.

Solution architecture is a practice of defining and describing an architecture of a system delivered in context of a specific solution and as such it may encompass description of an entire system or only its specific parts. Definition of a solution architecture is typically led by a solutions architect.

An enterprise architect is a person responsible for performing this complex analysis of business structure and processes and is often called upon to draw conclusions from the information collected.

12
Final Note about the Roles

Please note that the roles explained above are not limited only to the list or not mandatory to every project. The roles creation and selection depends on the project's architecture and the business flow. A single role mentioned in here can be split into more than one or couple of roles can be merged into one based on the requirement. 

13
Test your understanding on different roles in a DWH project

Test your understanding on on different roles in a DWH project

DW/BI/ETL Implementation Approach

1
Different phases in DW/BI/ETL Implementation Approach

A quick recap of  the different phases which are involved in most of the DW/BI/ETL projects. 

2
Knowledge Capture Sessions

In this lecture we talk about the key feature of knowledge sharing sessions before the requirements are being gathered.

3
Requirements

A critical early activity is requirement creation or the BRD (Business Requirement Document) creation. Requirements gathering sounds like common sense, but surprisingly, its an area that is given far too little attention. 

In this lecture we talk about the BRD's best practices and common mistakes to avoid.  

4
Architecture Phase

Architecture phases's importance and the dependency on the previous two phases is explained in this lecture. 

5
Data Model/Database

Once the Architecture phase is complete, the Data Model/Database phase will convert the Conceptual Data model to the Logical data model and then to the Physical data model. 

6
ETL Phase

In this lecture we will know about the ETL phase and the how this phase takes 70%  of the overall project implementation time. 

7
Data Access Phase

Data Access is the OLAP layer or the Reporting layer. There are multiple ways the Data can be accessed. Here are few of them. 

  • Selection
  • Drilling Down
  • Exception Handling
  • Calculations
  • Graphics/Visualization
  • Data Entry Options
  • Customization
  • Web Based Reporting
  • Broadcasting

Each of these are discussed further in detailed in the next lectures. 

8
Data Access Types - Selection

Selection is the most common and important feature of any OLAP tool. 

9
Data Access Types - Drilling Down

Drilling down through a database involves accessing information by starting with a general category and moving through the hierarchy: from category to file/table to record to field. 


When one drills down, one performs de facto data analysis on a parent attribute. Drilling down provides a method of exploring multidimensional data by moving from one level of detail to the next. Drill-down levels depend on the data granularity.

10
Data Access Types - Exception Reporting

Exception reporting eliminates the need to review countless reports to identify and address key business process issues before they begin to negatively impact a firm’s operations or profitability. 

11
Data Access Types - Calculations

In this lecture, we talk about the measures of the facts and how these are calculated based on the business validations and requirements. 

12
Data Access Types - Graphics and Visualization

Visualization is the process of representing data graphically and interacting with these representations in order to gain insight into the data. Traditionally, computer graphics has provided a powerful mechanism for creating, manipulating, and interacting with these representations.

You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.1
4.1 out of 5
795 Ratings

Detailed Rating

Stars 5
293
Stars 4
282
Stars 3
134
Stars 2
50
Stars 1
37
9202f6824e3cee3ce87d0341f315f051
30-Day Money-Back Guarantee

Includes

19 hours on-demand video
1 article
Full lifetime access
Access on mobile and TV
Certificate of Completion