4.45 out of 5
4.45
537 reviews on Udemy

Feature Engineering for Machine Learning

From beginner to advanced
Instructor:
Soledad Galli
3,516 students enrolled
Pre-process variables that contain missing data
Capture information from the missing values in your data
Work successfully with categorical variables
Convert labels of categorical variables into numbers that capture insight
Manipulate and transform numerical variables to extract the most predictive power
Transform date variables into insightful features
Apply different techniques of variable transformation to make features more predictive
Confidently clean and transform data sets for successful machine learning model building

Learn how to engineer features and build more powerful machine learning models.

This is the most comprehensive, yet easy to follow, course for feature engineering available online. Throughout this course you will learn a variety of techniques used worldwide for data cleaning and feature transformation, gathered from data competition websites, white papers, scientific articles, and from the instructor’s experience as a Data Scientist.

You will have at your fingertips, altogether in one place, a variety of techniques that you can apply to capture as much insight as possible with the features of your data set.   

The course starts describing the most simple and widely used methods for feature engineering, and then describes more advanced and innovative techniques that automatically capture insight from your variables. It includes an explanation of the feature engineering technique, the rationale to use it, the advantages and limitations, and the assumptions the technique makes on the data. It also includes full code that you can then take on and apply to your own data sets.

This course is suitable for complete beginners in data science looking to learn their first steps into data pre-processing, as well as for intermediate and advanced data scientists seeking to level up their skills.

With more than 50 lectures and 10 hours of video this comprehensive course covers every aspect of variable transformation. The course includes several techniques for missing data imputation, categorical variable encoding, numerical variable transformation and discretisation, as well as how to extract useful features from date and time variables. Throughout the course we use python as our main language, and open source packages for feature engineering, including the package “Feature Engine” which was specifically designed for this course.

This course comes with a 30 day money back guarantee. In the unlikely event you don’t find this course useful, you’ll get your money back.

So what are you waiting for? Enrol today, embrace the power of feature engineering and build better machine learning models.

Introduction

1
Introduction

Testing

2
Course Curriculum Overview
3
Course Requirements
4
How to Approach this Course
5
Setting up your computer
6
Installing XGBoost in Windows
7
Download Course Presentations
8
Download Jupyter Notebooks
9
Download Datasets
10
FAQ: Data Science, Python Programming, Datasets, Presentations and more...

Variable Types

1
Variables | Intro
2
Numerical Variables
3
Categorical Variables
4
Date and Time Variables
5
Mixed Variables
6
Bonus: More about the Lending Club dataset

Variable Characteristics

1
Variable Characteristics
2
Missing Data
3
Cardinality - Categorical Variables
4
Rare Labels - Categorical Variables
5
Linear Models Assumptions
6
Variable Distribution
7
Outliers
8
Variable Magnitude
9
Bonus: Machine Learning Algorithms Overview

Table illustrating the advantages and disadvantages of different machine learning algorithms, as well as their requirements in terms of feature engineering, and common applications. 

10
Bonus: Additional Reading Resources
11
FAQ: How can I learn more about machine learning?

Engineering missing values (NA) in numerical variables

1
Complete Case Analysis

In this lecture, I describe complete case analysis, what it is, what assumptions it makes, and what are the implications and consequences of handling missing values using this method.

2
Mean and median imputation

In this lecture, I describe what I mean by replacing missing values by the mean or median of the variable, what are the assumptions, advantages and disadvantages, and how they may affect the performance of machine learning algorithms.

3
Random sample imputation (part 1)

In this lecture, I describe what random sample imputation, the advantages, and the cares that should be taken were this method to be implemented in a business setting.

4
Random sample imputation (part 2)

Continues from previous lecture: in this lecture, I describe what random sample imputation, the advantages, and the cares that should be taken were this method to be implemented in a business setting.

5
Adding a variable to capture NA

Here I describe the process of adding one additional binary variable to capture those observations where data is missing. 

6
End of distribution imputation
7
Arbitrary value imputation

Engineering missing values (NA) in categorical variables

1
Frequent category imputation
2
Random sample imputation
3
Adding a variable to capture NA
4
Adding a category to capture NA

Bonus: More on engineering missing values

1
Overview of missing value imputation methods
2
Conclusion: when to use each NA imputation method

Engineering outliers in numerical variables

1
Top-coding, bottom-coding and zero-coding (part 1)

In this lecture I will describe a common method to handle outliers in numerical variables. These methods are commonly used in surveys as well as in other business settings.

2
Top-coding, bottom-coding and zero-coding (part 2)

This lecture continues from the previous one. 

I continue to describe a common method to handle outliers in numerical variables. These methods are commonly used in surveys as well as in other business settings.

Engineering rare values in categorical variables

1
Engineering rare values (part 1)

In this lecture I will describe and compare 2 methods commonly used to replace rare labels. Rare labels  are those categories within a categorical variable that contain very few observations, and therefore may affect tree based machine learning algorithm performance. 

In this lecture I will focus on variables with one predominant category.

2
Engineering rare values (part 2)

In this lecture I will describe and compare 2 methods commonly used to replace rare labels. Rare labels  are those categories within a categorical variable that contain very few observations, and therefore may affect tree based machine learning algorithm performance. 

In this lecture I will focus on variables with few categories.

3
Engineering rare values (part 3)

In this lecture I will describe and compare 2 methods commonly used to replace rare labels. Rare labels  are those categories within a categorical variable that contain very few observations, and therefore may affect tree based machine learning algorithm performance. 

In this lecture I will focus on variables with high cardinality.

4
Engineering rare values (part 4)

In this lecture I will focus on variables with several categories, using a different dataset, to get a better view of the benefits of engineering rare labels.

Engineer labels of categorical variables

1
One-hot-encoding
2
One-hot-encoding - variables with many labels
3
Ordinal numbering encoding
4
Count or frequency encoding
5
Target guided ordinal encoding
6
Mean encoding
7
Probability ratio encoding
8
Weight of evidence (WOE)
9
Comparison of categorical variable encoding
10
Bonus: Additional reading resources

Engineering mixed variables

1
Engineering mixed variables (part I)
2
Engineering mixed variables (part II)

Engineering dates

1
Engineering dates

Feature Scaling

1
Normalisation - Standarisation
2
Scaling to minimum and maximum values
3
Scaling to median and quantiles

Gaussian Transformation

1
Transformation with functions
2
Transformation with functions - Fare
3
Box Cox transformation

Discretisation

1
Equal frequency discretisation
2
Equal width discretisation
3
Domain knowledge discretisation
4
Discretisation with classification trees
5
Bonus: Additional reading resources

NEW: Engineering features with Feature_Engine

1
Introduction to Feature engine and downloading the package
2
Feature Engine: missing value imputation
3
Feature Engine: categorical variable encoding
4
Feature Engine: variable discretisation
5
Feature Engine: outlier handling
6
Feature Engine: variable transformation

Putting it all together

1
Classification
2
Regression
3
New: Regression with Feature_Engine

Final section | Next steps

1
BONUS: Discounts on my other courses!
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.5
4.5 out of 5
537 Ratings

Detailed Rating

Stars 5
294
Stars 4
171
Stars 3
60
Stars 2
6
Stars 1
8
4720be73da80da09f82e305ab9fd9bd7
30-Day Money-Back Guarantee

Includes

7 hours on-demand video
32 articles
Full lifetime access
Access on mobile and TV
Certificate of Completion