4.07 out of 5
4.07
46 reviews on Udemy

Data Quality Fundamentals

Understand key concepts, principles and terminology related to Data Quality.
Instructor:
Sid Inf
240 students enrolled
English [Auto-generated]
Determine data quality requirements by studying business functions, gathering information, evaluating output requirements and formats.
Profile select data sets to ensure quality and develop the data visualizations necessary to both manage and communicate data quality.
Coordinate business efforts to deliver data that is fit for use for use in critical processes, analysis and reports.
Collaborate with business application team to document information architecture requirements as needed
Serve as a subject matter expert and perform data quality related functions for urgent, high visibility, high profile, and strategic projects while meeting challenging deadlines.

Data quality is not necessarily data that is devoid of errors. Incorrect data is only one part of the data quality equation. Managing data quality is a never ending process. Even if a company gets all the pieces in place to handle today’s data quality problems, there will be new and different challenges tomorrow. That’s because business processes, customer expectations, source systems, and business rules all change continuously. To ensure high quality data, companies need to gain broad commitment to data quality management principles and develop processes and programs that reduce data defects over time.

Much like any other important endeavor, success in data quality depends on having the right people in the right jobs. This course helps you understand key concepts, principles and terminology related to data quality and other areas in data management. 

Data Quality

1
What is Data Quality?

There are many definitions for Data Quality. Here is one of them.

In today’s era of data-driven decision making, data needs to be treated as an organizational asset; data without quality cannot serve any purpose. Data quality is an assessment of data’s fitness for purpose. Data quality is an essential characteristic that determines the reliability of data for making decisions. If the data is not trustworthy, then analytics and reporting that run on the data cannot be trusted.

To put it another way, if you have data quality, your data is capable of delivering the insight you hope to get out of it. Conversely, if you don’t have data quality, there is a problem in your data that will prevent you from using the data to do what you hope to achieve with it.

2
Example of Data Quality

To illustrate the definition of Data Quality, let’s examine a few examples of real-world data quality challenges using the situations of

  • Customer's data entry for any online registration
  • How the technical issues/bugs can cause Data Quality issues
  • How the mergers/acquisitions lead to Data Quality issues
3
Can we achieve 100 % Data Quality?

Is 100% Data Quality necessary?

Is it possible in the first place?

4
What can be done to achieve 100% Data Quality?

What are the different steps to be done to achieve 100% Data Quality? 

5
How can we measure Data Quality?

What are the different ways to measure Data Quality is discussed in this lecture. 

Data Quality Dimensions

1
What are Data Quality Dimensions?

In order for the analyst to determine the scope of the underlying root causes and to plan the ways that tools can be used to address data quality issues, it is valuable to understand these common and core data quality dimensions.

2
Consistency Data Quality Dimension

Consistency means data across all systems reflects the same information and are in synch with each other across the enterprise. Examples:

  1. A business unit status is closed but there are sales for that business unit.
  2. Employee status is terminated but pay status is active.

Questions you can ask yourself: Are data values the same across the data sets? Are there any distinct occurrences of the same data instances that provide conflicting information?

3
Completeness Data Quality Dimension

Is all the requisite information available? Are data values missing, or in an unusable state? In some cases, missing data is irrelevant, but when the information that is missing is critical to a specific business process, completeness becomes an issue. 

4
Timeliness Data Quality Dimension

Timeliness references whether information is available when it is expected and needed. Timeliness of data is very important. This is reflected in:

  • Companies that are required to publish their quarterly results within a given frame of time
  • Customer service providing up-to date information to the customers
  • Credit system checking in real-time on the credit card account activity

The timeliness depends on user expectation. Online availability of data could be required for room allocation system in hospitality, but nightly data could be perfectly acceptable for a billing system.

5
Uniqueness Data Quality Dimension

What data is missing important relationship linkages? The inability to link related records together may actually introduce duplication across your systems. Not only that, as more value is derived from analyzing connectivity and relationships, the inability to link related data instance together impedes this valuable analysis. 

6
Validity Data Quality Dimension

This lecture covers the Validity Data Quality Dimension. 

7
Accuracy Data Quality Dimension

Accuracy is the degree to which data correctly reflects the real world object OR an event being described. Examples:

  1. Sales of the business unit are the real value.
  2. Address of an employee in the employee database is the real address.

Questions you can ask yourself: Do data objects accurately represent the “real world” values they are expected to model? Are there incorrect spellings of product or person names, addresses, and even untimely or not current data?  These issues can impact operational and analytical applications.

8
Example of Data Quality Dimension

This lecture covers the examples of all the core dimensions we have discussed till now. 

Data Quality Vs Data Governance

1
Data Quality Vs Data Governance

A quick review on what is the difference between Data Quality and Data Governance. 

Data Life Cycle

1
Introduction to the End to End Data Life Cycle with a case study

This lecture describes the Data Life Cycle. Though there are a lot of different Data Cycles described this is the acceptable one with in the Data Management  professionals. 


There are three main ways that data can be captured, and these are very important:

  1. Data Acquisition:the ingestion of already existing data that has been produced by an organization outside the enterprise
  2. Data Entrythe creation of new data values for the enterprise by human operators or devices that generate data for the enterprise
  3. Signal Reception:the capture of data created by devices, typically important in control systems, but becoming more important for information systems with the Internet of Things


2
Data Maintenance

Data Maintenance is the focus of a broad range of data management activities.  Because of this, Data Governance faces a lot of challenges in this area.  Perhaps one of the most important is rationalizing how data is supplied to the end points for Data Synthesis and Data Usage, e.g. preventing proliferation of point-to-point transfers.

3
Data Derivation

Data Derivation is discussed in this lecture. 

4
Data Usage

How is the data used with in the enterprise is discussed in this lecture. 

5
Data Publication

How is the data being used with in the enterprise and how is it shared with third party vendors. 

6
Data Archival

A data archive is simply a place where data is stored, but where no maintenance, usage, or publication occurs.  If necessary the data can be restored to an environment where one or more of these occur. 

7
Data Purging

The removal of every copy of a data item from the enterprise. Ideally, this will be done from an archive.  A Data Governance challenge in this phase of the data life cycle is proving that the purge has actually been done properly.

Data Quality Life Cycle

1
Data Quality Life Cycle

Let's talk about the Data Quality Life Cycle. 

Data Profiling

1
What is Data Profiling?

Data profiling is an assessment of data values within a given data set for uniqueness, consistency, and logic – the three key data quality metrics.

2
Commonly used data types during Data Profiling

In this lecture, we do a review of what are the different data types which can be used to profile data. 

3
Data Profiling Vs Data Mining

A quick difference between Data Profiling and Data Mining. 

4
What are the different types of Data Profiling?

This lecture describes the different types of Data Profiling. 

Business Expectations and Impacts of Low Data Quality

1
Business Expectations on Data Quality

How does the Business Expectations differ with the Data Quality Expectations. Both of the expectations should be aligned to form a Data Quality Framework. 

2
Impacts and Costs of Low Data Quality - Part 1

This lecture covers the different impacts and costs of not managing the Data Quality.

3
Impacts and Costs of Low Data Quality - Part 2

This lecture covers the different impacts and costs of not managing the Data Quality.

4
How to correct the existing errors in the Data Warehouse?

It is quite common that the existing Data Warehouse or the Data Lake will already have the Data Quality issues. In this lecture we will review the different possibilities and how to avoid that. 


5
How does the Enhance, Transform and Calculate phase or the ETL phase help?

How does the Enhance, Transform and Calculate phase or the ETL phase help in Data Quality? is discussed in this lecture. 

6
Data Standardization

Data standardization is the critical process of bringing data into a common format that allows for collaborative research, large-scale analytics, and sharing of sophisticated tools and methodologies.

7
Complete and Corrected Data

Different cases to complete and correct the data are described in this lecture. 

8
Match and Consolidate the Data

Once the data is in the Warehouse the match and consolidate process is discussed in this lecture. 

Data Quality Roles

1
Different Data Quality Roles in an Enterprise

The different possible Data Quality roles in an enterprise are discussed here. 

Bonus Section

1
Direct links to other courses

This lecture will provide you with the coupons for other courses.

You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.1
4.1 out of 5
46 Ratings

Detailed Rating

Stars 5
17
Stars 4
16
Stars 3
11
Stars 2
1
Stars 1
2
95bee1605b3e821aed4dda3e0e9644ba
30-Day Money-Back Guarantee

Includes

3 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion