4.33 out of 5
4.33
3 reviews on Udemy

ETL Testing Interview Questions

Real time ETL Testing Interview Questions and Answers
Instructor:
Sid Inf
8 students enrolled
English [Auto-generated]
Different tips on how to handle the ETL Testing interviews
Different practical questions which will be asked during real time interviews and how to answer them.
Understand what kind of questions are asked in Beginner ETL Testing Interviews
Answer questions on DataWarehouse Concepts, SQL, Informatica Transformations, Workflows and ETL Testing.
Understand ETL Testing Best Practices

Preparing for an interview is tricky. You would need to get a good understanding of new features and revise concepts you used in your preparation. This course helps you prepare for ETL Testing Interview with hands-on code examples covering 200+ Interview Questions and Answers on varied range of topics.

Discover not just what are the interview questions, but how to answer the questions to ensure you get the job as an ETL Testing professional or Data warehouse Testing professional. All questions are dealt with in detailed explanation and narration of what happens practically and in real time.

What will you learn?

  • Understand what kind of questions are asked in ETL/DWH/BI Testing Interviews.

  • Answer questions on Data warehouse concepts, ETL , BI and various other practical scenarios in real time projects.

  • Understand New Features of ETL Tools.

  • Understand Basic Testing Concepts.

  • Understand Advanced practical DWH/ETL/BI Testing Concepts.

  • Answer questions on Data Validations, Test Data Creation and Supporting the business in UAT.

  • And much more….

What is required for you to start with this course?

  • Familiarity with RDBMS Concpts and basics of ETL Testing.

  • In the course, we use Informatica 9x/10x and Oracle 11g to demonstrate examples.

Introduction

1
Course Pre-Requisite and Introduction

Course Pre-Requisite and Introduction

Interview Questions on ETL and ETL Concepts

1
What do you understand about ETL?

ETL stands for Extract, Transform, and Load. It is an important concept in Data Warehousing systems. Extraction stands for extracting data from different data sources such as transactional systems or applications. Transformation stands for applying the conversion rules on data so that it becomes suitable for analytical reporting. The loading process involves moving the data into the target system, normally a data warehouse.

2
What does ETL testing operations include?

ETL testing helps in

  • Verify if the data is transformed as defined in the business requirements
  • Verify that the projected and expected data is loaded into the data warehouse without any truncation and data loss
  • Make sure that ETL application identifes and reports invalid data and replaces with default values where ever defined
  • Make sure that data loads at expected time to improve scalability and performance
  • Make sure that the data is flown across the multiple layers as defined in the architecture and is adhering to the technical and business rules.
3
What are the popular ETL tools available in the market?

Many ETL tools were originally developed to make the task of the data warehouse developer easier and more fun. Developers are spared the arduous task of handwriting SQL code, replacing it with easy drag and drop to develop a data warehouse.

Today, the top ETL tools in the market have vastly expanded their functionality beyond data warehousing and ETL. They now contain extended functionalities for data profiling, data cleansing, Enterprise Application Integration (EAI), Big Data processing, data governance and master data management.

The popular ETL tools available in the market are −

  • Informatica − Power Center
  • IBM − Websphere DataStage (Formerly known as Ascential DataStage)
  • SAP − Business Objects Data Services BODS
  • IBM − Cognos Data Manager (Formerly known as Cognos Decision Stream)
  • Microsoft − SQL Server Integration Services SSIS
  • Oracle − Data Integrator ODI (Formerly known as Sunopsis Data Conductor)
  • SAS − Data Integration Studio
  • Oracle − Warehouse Builder
  • ABInitio
  • Open source Clover ETL

Interview questions on Data Warehouse Concepts

1
Why Data Warehouse is implemented?

The concept of data warehousing is not hard to understand. The notion is to create a permanent storage space for the data needed to support reporting, analysis, and other BI functions. In this lecture we understand what are the main reasons behind creating a data warehouse and the benefits of it.

This long list of benefits is what makes data warehousing an essential management tool for businesses that have reached a certain level of complexity.

2
What are different Data Warehouse architectures which a project can have?

There are different terms used when any data warehouse architecture is mentioned. In this lecture, we will understand what architectures are possible and what are practically implemented.

3
What is the use of Staging in a Data Warehouse architecture?

The Data Warehouse Staging Area is temporary location where data from source systems is copied. A staging area is mainly required in a Data Warehousing Architecture for timing reasons. In short, all required data must be available before data can be integrated into the Data Warehouse.

Due to varying business cycles, data processing cycles, hardware and network resource limitations and geographical factors, it is not feasible to extract all the data from all Operational databases at exactly the same time.

4
What is a Data Mart?

A Data Mart is a subset of a data warehouse that can provide data for reporting and analysis on a section, unit or a department like Sales Dept, HR Dept, etc.

5
Is it mandatory to have Data marts in the Data Warehouse architecture or project

We can either structure the warehouse for analysis in which case there need be no data marts. However, data marts are often employed and typically contain subsets of data from the warehouse, organized to provide a particular analytical view and delivered to groups or individuals in an organization. For example, a mart of marketing data could be provided for the advertising department and one of staff data to the HR department.

So, even if you have a warehouse you don't have to use data marts. Equally, it is possible to create marts without a warehouse. You can put the data from a particular system, a finance system for example, into a data mart and structure the data for analytical purposes.

Once you understand the problems that warehouses and marts are designed to address, it is much easier to decide which you need.

6
What is the need of the DataMart in a project?

We can either structure the warehouse for analysis in which case there need be no data marts. However, data marts are often employed and typically contain subsets of data from the warehouse, organized to provide a particular analytical view and delivered to groups or individuals in an organization. For example, a mart of marketing data could be provided for the advertising department and one of staff data to the HR department.

So, even if you have a warehouse you don't have to use data marts. Equally, it is possible to create marts without a warehouse. You can put the data from a particular system, a finance system for example, into a data mart and structure the data for analytical purposes.

Once you understand the problems that warehouses and marts are designed to address, it is much easier to decide which you need.

7
Where does the Data Mart fit into in the overall Data Warehouse architecture?

If there is a requirement to have the data mart and the data warehouse together, how would that be and how will that architecture be.

8
What is OLAP?

OLAP stands for Online Analytical Processing. It uses database tables (Fact and Dimension tables) to enable multidimensional viewing, analysis and querying of large amount of data.

9
What are the structural differences between an OLTP and OLAP system? Part 1

OLTP stands for Online Transaction Processing Except data warehouse databases the other databases are OLTPs. These OLTP uses normalized schema structure. These OLTP databases are designed for recording the daily operations and transactions of a business.

10
What are the structural differences between an OLTP and OLAP system? Part 2

OLTP stands for Online Transaction Processing Except data warehouse databases the other databases are OLTPs. These OLTP uses normalized schema structure. These OLTP databases are designed for recording the daily operations and transactions of a business.

11
What are Dimensions?

Dimensions are categories by which summarized data can be viewed. For example a profit Fact table can be viewed by a time dimension.

12
What are Confirmed Dimensions?

The Dimensions which are reusable and fixed in nature Example customer, time, geography dimensions.

13
What are Fact Tables?

A Fact Table is a table that contains summarized numerical (facts) and historical data. This Fact Table has a foreign key-primary key relation with a dimension table. The Fact Table maintains the information in 3rd normal form.

14
What are Additive Facts?

The types of Facts are as follows.

1. Additive Facts: A Fact which can be summed up for any of the dimension available in the fact table.

2. Semi-Additive Facts: A Fact which can be summed up to a few dimensions and not for all dimensions available in the fact table.

3. Non-Additive Fact: A Fact which cannot be summed up for any of the dimensions available in the fact table.

15
What are Semi-Additive Facts?

A Fact which can be summed up to a few dimensions and not for all dimensions available in the fact table.

16
What are Non-Additive Facts?

A Fact which cannot be summed up for any of the dimensions available in the fact table

17
Explain what is Grain of Fact?

Grain fact can be defined as the level at which the fact information is stored. It is also known as Fact Granularity

18
What is a Star schema design?

A Star schema is defined as a logical database design in which there will be a centrally located fact table which is surrounded by at least one or more dimension tables. This design is best suited for Data Warehouse or Data Mart

19
What is Snow Flake schema design?

In a Snow Flake design the dimension table (de-normalized table) will be further divided into one or more dimensions (normalized tables) to organize the information in a better structural format. To design snow flake we should first design star schema design.

20
What is Operational Data Store [ODS] ?

It is a collection of integrated databases designed to support operational monitoring. Unlike the OLTP databases, the data in the ODS are integrated, subject oriented and enterprise wide data.

Interview questions related to tables and database in the project

1
How ETL Testing is different from database testing?
FUNCTIONDATABASE TESTINGETL TESTING
Primary GoalData validation and IntegrationData Extraction, Transform and Loading for BI Reporting
Applicable SystemTransactional system where business flow occursSystem containing historical data and not in business flow environment
Common Tools in marketQTP, Selenium, etc.QuerySurge, Informatica, etc.
Business NeedIt is used to integrate data from multiple applications, Severe impact.It is used for Analytical Reporting, information and forecasting.
ModelingER methodMultidimensional
Database TypeIt is normally used in OLTP systemsIt is applied to OLAP systems
Data TypeNormalized data with more joinsDe-normalized data with less joins, more indexes and Aggregations.
2
Which type of schema is used and why?

What type of schema was used in your project and why?

3
How many tables are there in your project?

One of the common questions asked in any interview is how many tables are there in your project or data model?

4
What were/are the tables used in your project?

What were/are the tables used in your project?

5
Which were/are the columns in the tables?

Which were/are the columns in the tables?

6
How many records does your sources and targets have?

How many records does your sources and targets have?

7
Explain different queries used in the project?

Explain different queries used in the project?

8
Have you performed Type 2 validations in your project?

Have you performed Type 2 validations in your project?

9
What are the challenges faced during testing?

What are the challenges faced during testing?

10
How many facts, dimensions and reference tables are in their in your project?

How many facts, dimensions and reference tables are in their in your project?

11
Questions on facts, dimensions and the related columns

Can you name some columns columns used in the fact and dimension tables?

12
What is Data Sensitivity?

What is Data Sensitivity?

13
How to identify Fact, Dimension, Reference tables based on data?

How to identify Fact Dimensions Reference Tables based on data?

14
What is Reconciliation? Have you tested it in your project?

What is Reconciliation?

15
How are the exit and entry criteria defined?

How are the exit and entry criteria defined?

16
What is an Aggregate function? Name a few common aggregate functions?

Aggregate functions are used to group multiple rows of a single column to form a more significant measurement. They are also used for performance optimization when we save aggregated tables in data warehouse.

Common Aggregate functions are −

MINreturns the smallest value in a given column
MAXreturns the largest value in a given column
SUMreturns the sum of the numeric values in a given column
AVGreturns the average value of a given column
COUNTreturns the total number of values in a given column
COUNT(*)returns the number of rows in a table

Example

SELECT AVG(salary) 
FROM employee 
WHERE title = 'developer';
17
What is a surrogate key in a database?

A Surrogate key is something having sequence-generated numbers with no meaning, and just to identify the row uniquely. It is not visible to users or application. It is also called as Candidate key.

Interview questions on Data Validations

1
Where does the data validations happen?

Where does the data validations happen?

2
What is Data Cleansing and Data Scrubbing?

What is Data Cleansing and Data Scrubbing?

3
What are Data Type and Data Length Validations?

What are Data Type and Data Length Validations?

4
What are the approaches followed to insert new data file validations?

What are the approaches followed to insert new data file validations?

5
What validations are done in Incremental Loads?

What validations are done in  Incremental Loads?

Interview questions related to project/application

1
What areas are covered under finance domain for ETL testing projects?

What areas are covered under finance domain for ETL testing projects?

2
Is the test data created and loaded by the development team?

Is the test data created and loaded by the development team?

3
What are the areas where there is a possibility of defects in the Data Warehouse

What are the areas where there is a possibility of defects in the Data Warehouse?

4
What are the documents used in a project and where are the validations defined?

What are the documents used in a project and where are the validations defined?

5
Who is responsible for generating the different kinds of documents in a project?

Who is responsible for generating the different kinds of documents in a project?

6
What is a test case document?

What is a test case document?

7
How are Bugs or defects managed in your project?

How are Bugs or defects managed in your project?

8
What were the details in each project document?

What were the details in each project document?

9
What was the complete process followed in your project?

What was the complete process followed in your project?

10
Can you explain about the number of downstream systems to your project?

Can you explain about the number of downstream systems to your project?

11
How to test end to end integration of system?

How to test end to end integration of system?

Interview questions on RCA (Root Cause Analysis) and UAT Support

1
Did you perform any Root Cause Analysis (RCA) for issues in your project?

Did you perform any Root Cause Analysis (RCA) for issues in your project?

2
What are the steps to prepare test data for UAT (Production like)?

What are the steps to prepare test data for UAT (Production like)?

3
How did you support UAT?

How did you support UAT?

Interview questions related to BI/Reporting

1
What is the difference between ETL and BI tools?

An ETL tool is used to extract data from different data sources, transform the data, and load it into a Data Warehouse. A BI tool however is used to generate interactive and ad-hoc or on demand reports for end-users, dashboard for senior management, data visualizations for monthly, quarterly, and annual board meetings.

Most common ETL tools include − SAP BO Data Services (BODS), Informatica, Microsoft – SSIS, Oracle Data Integrator ODI, Talend Open Studio, Clover ETL Open source, etc.

Most common BI tools include − SAP Business Objects, SAP Lumira, IBM Cognos, JasperSoft, Microsoft BI Platform, Tableau, Oracle Business Intelligence Enterprise Edition, etc.

2
How many down stream systems are there for your application/project?

How many down stream systems are there for your application/project?

3
What is Ad-Hoc Reporting?

What is Ad-Hoc Reporting?

Interview questions on Data Modelling concepts

1
What are some of the basic concepts to consider during Data Modelling?

What are some of the basic concepts to consider during Data Modelling?

Interview questions on Master Data Management (MDM)

1
What is MDM and what is the need for it in an enterprise and how is it setup?

What is MDM and what is the need for it in an enterprise and how is it setup?

Interview questions on Slowly Changing Dimensions (SCD)

1
Have you performed SCD Type 2 Validations in your project?

Have you performed SCD Type 2 Validations in your project?

2
How is the SCD setup done at ETL layer?

How is the SCD setup done at ETL layer?

3
What are the challenges faced with respect to SCD?

What are the challenges faced with respect to SCD?

4
What is ETL automation and what are the areas in which you have done this?

What is ETL automation and what are the areas in which you have done this?

Interview questions on Informatica Power Center Tool

1
What do you understand by the term ‘transformation’?

A transformation is a repository object that generates, modifies, or passes data. Transformations in a mapping represent the operations the PowerCenter Server performs on the data. Data passes into and out of transformations through ports that you link in a mapping or mapplet. Transformations can be active or passive. An active transformation can change the number of rows that pass through it. A passive transformation does not change the number of rows that pass through it.

A transformation is a set of rules which generates, modifies, or passes data. Transformation can be of two types − Active and Passive.

2
What are the different types of Transformations available in Informatica?
Transformation TypeDescription
AggregatorActive / ConnectedPerforms aggregate calculations
Application Source QualifierActive / ConnectedRepresents the rows that the Power Center Server reads from an application, such as an ERP source, when it runs a session.
CustomActive or Passive / ConnectedCalls a procedure in a shared library or DLL.
ExpressionPassive / ConnectedCalculates a value
External ProcedureActive / Connected or UnconnectedCalls a procedure in a shared library or in the COM layer of windows.
FilterActive / ConnectedFilters data
InputPassive / ConnectedDefines mapplet input rows. Available in the Mapplet Designer
JoinerActive / ConnectedJoins data from different databases of flat file systems.
LookupPassive / Connected or UnconnectedLooks up values
NormalizerActive / ConnectedSource qualifier for COBOL sources. Can also use in the pipeline to normalize data from relational or flat file sources.
OutputPassive / ConnectedDefines mapplet output rows. Available in the Mapplet Designer.
RankActive / ConnectedLimits records to a top or bottom range.
RouterActive / ConnectedRouter data into multiple transformations based on group conditions.
Sequence GeneratorPassive / ConnectedGenerates primary keys.
SorterActive / ConnectedSorts data base4d on a sort key.
Source QualifierActive / ConnectedRepresents the rows that the PowerCenter Server reads from a relational or flat file source when it runs a session.
Stored ProcedurePassive / Connected or UnconnectedCalls a stored procedure.
Transaction ControlActive / ConnectedDefines commit and rollback transactions.
UnionActive / ConnectedMerges data from different databases or flat file systems.
Update StrategyActive / ConnectedDetermines whether to insert, delete, update, or reject rows.
XML GeneratorActive / ConnectedReads data from one or more input ports and outputs XML through a single output port.
XML ParserActive / ConnectedReads XML from one input port and outputs data to one or more output ports.
XML Source QualifierActive / ConnectedRepresents the rows that the PowerCenter Server reads from an XML source when it runs a session.
3
What do you understand by Active and Passive Transformations?

In an active transformation, the number of rows that is created as output can be changed once a transformation has occurred. This does not happen during a passive transformation. The information passes through the same number given to it as input.

4
What is lookup transformation and when is it used?

Lookup transformation allows you to access data from relational tables which are not defined in mapping documents. It allows you to update slowly changing dimension tables to determine whether the records already exist in the target or not.

5
Explain what are the differences between Unconnected and Connected lookup?

Connected Lookup

Unconnected Lookup

  • Connected lookup participates in mapping

- It is used when lookup function is used instead of an expression transformation while mapping

  • Multiple values can be returned

- Only returns one output port

  • It can be connected to another transformations and returns a value
  • Another transformation cannot be connected
  • Static or dynamic cache can be used for connected Lookup
  • Unconnected as only static cache
  • Connected lookup supports user defined default values
  • Unconnected look up does not support user defined default values
  • In Connected Lookup multiple column can be return from the same row or insert into dynamic lookup cache
  • Unconnected lookup designate one return port and returns one column from each ro
6
What is a Worklet?

The worklet is a group of sessions. To execute the worklet we have to create the workflow.

7
What is a Work Flow?

A Work Flow is a set of instructions on how to execute tasks such as sessions, emails and shell commands. A WorkFlow is created from Workflow Manager.

8
Explain the terms − Mapplet, Session, Mapping, Workflow − in an ETL process?
  • A Mapplet defines the Transformation rules.
  • Sessions are defined to instruct the data when it is moved from source to target system.
  • A Workflow is a set of instructions that instructs the server on task execution.
  • Mapping is the movement of data from the source to the destination.
9
Explain what is tracing level and what are the types?

Tracing level is the amount of data stored in the log files. Tracing level can be classified in two Normal and Verbose. Normal level explains the tracing level in a detailed manner while verbose explains the tracing levels at each and every row.

None: Integration Service uses the tracing level set in the mapping.
Terse: Integration Service logs initialization information, error messages, and notification of rejected data.
Normal: Integration Service logs initialization and status information, errors encountered, and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows.
Verbose Initialization: In addition to normal tracing, the Integration Service logs additional initialization details, names of index and data files used, and detailed transformation statistics.
Verbose Data: In addition to verbose initialization tracing, the Integration Service logs each row that passes into the mapping. Also notes where the Integration Service truncates string data to fit the precision of a column and provides detailed transformation statistics.
When you configure the tracing level to verbose data, the Integration Service writes row data for all rows in a block when it processes a transformation.

Interview questions on UNIX commands

1
Where is UNIX used in a Data Warehouse Project?

Where is UNIX used in a Data Warehouse Project?

2
What are the situations where you had to use the UNIX commands in ETL testing?

What are the situations where you had to use the UNIX commands in ETL testing?

3
What is the process to connect to LINUX from Windows machine in DWBI projects?

This lecture shows how to connect to UNIX/LINUX in real time projects.

4
What are the commands to navigate between folders in UNIX?

What are the commands to navigate between folders in UNIX?

5
Touch command and some basic rules on file creations and file naming conventions

In this lecture, we talk about the common errors which the beginners do in LINUX using the file creation and naming conventions of the files.

6
'ls' command and permissions

Understanding the default layout of the ls command and permissions

7
'file' command

What is File command and how to use it?

8
'less' command

'less command'

9
What are wildcards? How to use them?

What are wildcards? How to use them?

You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.3
4.3 out of 5
3 Ratings

Detailed Rating

Stars 5
1
Stars 4
2
Stars 3
0
Stars 2
0
Stars 1
0
5104a3781d1a5510fbdc778eb833472d
30-Day Money-Back Guarantee

Includes

8 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion