Welcome to A Beginners Guide to Data Warehousing and ETL Development. In this course we will build a data warehouse against a demo transactional database, and cover development of an ETL process to support it. In this course you will go above the basics, I don’t like to skip important concepts during the fundamentals just because they are advanced topics.
During our ETL development you will first learn how to develop SQL code based ETL and then transition into ETL with SSIS. This will ensure that when your faced with an issue in SSIS (technical, performance, etc) you have a full understanding of the underlying layer and diagnose the issue.
This course will focus on the Microsoft SQL Server stack, however the concepts, designs, and methods should transition to most relational databases.
Introduction
Welcome to A Beginners Guide to Data Warehousing and ETL Development. I am John Harbison and I will be your instructor on this course.
A little about me. Over the last 18 years I have been a programmer, data warehouse architect, etl developer, or some form of data ninja. During this time I have had the pleasure of training people in the art of data warehousing and I am looking forward to sharing this knowledge with you.
A quick run down of the course. We are going to review our “client requirements” and get our environment setup together. We are then going to the major components, tools, and parts going into our warehouse and ETL project. After all that we are going to dive right in and build out our warehouse and ETL. Our ETL is going to get built twice in this course, first as direct SQL loaders and then with SSIS. I prefer this method when learning so that you understand what is happening behind the scenes when your using the visual development tool. When we have a working warehouse and ETL, we are then going to cover everyday advanced topics that come up in real world projects.
I hope you enjoy the course and have a great takeaway from the time you have invested in learning data warehousing and ETL.
In this lecture we will review the source application database to figure out what we need to build.
In this lecture we will cover how to setup your environment and install the application database.
Parts of A Data Warehouse
In this lecture we will review common source data sets and how we plan for them.
In this lecture I talk about the stage environment: what it does and when you need it. I also cover some points about loading to and from it.
In this lecture I discuss the main components of the data warehouse.
In this lecture I cover a very important (and often overlooked) topic. Your implementation of Audit and Control can make or break your data project.
In this lecture I discuss the need and purpose of the Presentation database / layer.
In this lecture I cover the basics of Master Data and Master Data Management. Also are some examples of self implementing master data.
Building the Warehouse and ETL
Our first practical lecture. In this lecture we model the data warehouse. This involves a detailed look into the application database, and then after we discuss our modeling design choices we build the structure of the warehouse.
In this lecture we build ETL loaders using SQL scripting. These loaders are production ready if your not wanting to learn the visual ETL tool. If you are looking to move into the visual development tool, building the loaders first from script will give you insight into what the visual tool is doing behind the scenes and improve your ability to debug issues that arise.
Advanced Topics
In this lecture I explain what a Temporal table is and how to implement one from scratch or on existing tables.
In this lecture I discuss how to gain a major performance increase in your loaders by using MD5.
One thing I failed to mention in the video is if your being overwhelmed in your stage environment with rows that are not needed (application moves dates without change to row) you can also use MD5 in your ETL or CDC tool to cut down on that volume.
In this lecture we talk about the different kinds of delete operations that can happen in your source application and what you need to do to handle them.
In this lecture I discuss common issues you will encounter with unstructured data and some things you can try to easy the pain points.