Building Recommender Systems with Machine Learning and AI
Learn how to build recommender systems from one of Amazon’s pioneers in the field. Frank Kane spent over nine years at Amazon, where he managed and led the development of many of Amazon’s personalized product recommendation technologies.
You’ve seen automated recommendations everywhere – on Netflix’s home page, on YouTube, and on Amazon as these machine learning algorithms learn about your unique interests, and show the best products or content for you as an individual. These technologies have become central to the largest, most prestigious tech employers out there, and by understanding how they work, you’ll become very valuable to them.
We’ll cover tried and true recommendation algorithms based on neighborhood-based collaborative filtering, and work our way up to more modern techniques including matrix factorization and even deep learning with artificial neural networks. Along the way, you’ll learn from Frank’s extensive industry experience to understand the real-world challenges you’ll encounter when applying these algorithms at large scale and with real-world data.
Recommender systems are complex; don’t enroll in this course expecting a learn-to-code type of format. There’s no recipe to follow on how to make a recommender system; you need to understand the different algorithms and how to choose when to apply each one for a given situation. We assume you already know how to code.
However, this course is very hands-on; you’ll develop your own framework for evaluating and combining many different recommendation algorithms together, and you’ll even build your own neural networks using Tensorflow to generate recommendations from real-world movie ratings from real people. We’ll cover:
Building a recommendation engine
Evaluating recommender systems
Content-based filtering using item attributes
Neighborhood-based collaborative filtering with user-based, item-based, and KNN CF
Model-based methods including matrix factorization and SVD
Applying deep learning, AI, and artificial neural networks to recommendations
Session-based recommendations with recursive neural networks
Scaling to massive data sets with Apache Spark machine learning, Amazon DSSTNE deep learning, and AWS SageMaker with factorization machines
Real-world challenges and solutions with recommender systems
Case studies from YouTube and Netflix
Building hybrid, ensemble recommenders
This comprehensive course takes you all the way from the early days of collaborative filtering, to bleeding-edge applications of deep neural networks and modern machine learning techniques for recommending the best items to every individual user.
The coding exercises in this course use the Python programming language. We include an intro to Python if you’re new to it, but you’ll need some prior programming experience in order to use this course successfully. We also include a short introduction to deep learning if you are new to the field of artificial intelligence, but you’ll need to be able to understand new computer algorithms.
High-quality, hand-edited English closed captions are included to help you follow along.
I hope to see you in the course soon!
After a brief introduction to the course, we'll dive right in and install what you need: Anaconda (your Python development environment,) the course materials, and the MovieLens data set of 100,00 real movie ratings from real people. We'll then run a quick example to generate movie recommendations using the SVD algorithm, to make sure it all works!
We'll just lay out the structure of the course so you know what to expect later on (and when you'll start writing some code of your own!) Also, we'll provide advice on how to navigate this course depending on your prior experience.
There are many different flavors of recommender systems, and you encounter them every day. Let's review some of the applications of recommender systems in the real world.
How do recommender systems learn about your individual tastes and preferences? We'll explain how both explicit ratings and implicit ratings work, and the strengths and weaknesses of both.
Most real-world recommender systems are "Top-N" systems, that produce a list of top results to individuals. There are a couple of main architectural approaches to building them, which we'll review here.
We'll review what we've covered in this section with a quick 4-question quiz, and discuss the answers.
Introduction to Python [Optional]
After installing Jupyter Notebook, we'll cover the basics of what's different about Python, including its use of white-space. We'll dissect a simple function to get a feel of what Python code looks like.
We'll look at using lists, tuples, and dictionaries in Python.
We'll see how to define a function in Python, and how Python lets you pass functions to other functions. We'll also look at a simple example of a Lambda function.
We'll look at how Boolean expressions work in Python as well as loops. Then, we'll give you a challenge to write a simple Python function on your own!
Evaluating Recommender Systems
Learn about different testing methodologies for evaluating recommender systems offline, including train/test, K-Fold Cross Validation, and Leave-One-Out cross-validation.
Learn about Root Mean Squared Error, Mean Absolute Error, and why we use these measures of recommendation prediction accuracy.
Learn about several ways to measure the accuracy of top-N recommenders, including hit rate, cumulative hit rate, average reciprocal hit rank, rating hit rate, and more.
Learn how to measure the coverage of your recommender system, how diverse its results are, and how novel its results are.
Measure how often your recommendations change (churn,) how quickly they respond to new data (responsiveness,) and why no metric matters more than the results of real, online A/B tests. We'll also talk about perceived quality, where you explicitly ask your users to rate your recommendations.
In this short quiz, we'll review what we've learned about different ways to measure the qualities and accuracy of your recommender system.
Let's walk through this course's Python module for implementing the metrics we've discussed in this section on real recommender systems.
We'll walk through our sample code to apply our RecommenderMetrics module to a real SVD recommender using real MovieLens rating data, and measure its performance in many different ways.
After running TestMetrics.py, we'll look at the results for our SVD recommender, and discuss how to interpret them.
A Recommender Engine Framework
Let's review the architecture of our recommender engine framework, which will let us easy implement, test, and compare different algorithms throughout the rest of this course.
In part one of the code walkthrough of our recommender engine, we'll see how it's used, and dive into the Evaluator class.
In part two of the walkthrough, we'll dive into the EvaluationData class, and kick off a test with the SVD recommender.
Wrapping up our review of our recommender system architecture, we'll look at the results of using our framework to evaluate the SVD algorithm, and interpret them.
We'll talk about how content-based recommendations work, and introduce the cosine similarity metric. Cosine scores will be used throughout the course, and understanding their mathematical basis is important.
We'll cover how to factor time into our content-based recs, and how the concept of KNN will allow us to make rating predictions just based on similarity scores based on genres and release dates.
We'll look at some code for producing movie recommendations based on their genres and years, and evaluate the results using the MovieLens data set.
In our first "bleeding edge alert," we'll examine the use of Mise en Scene data for providing additional content-based information to our recommendations. And, we'll turn the idea into code, and evaluate the results.
In two different hands-on exercises, dive into which content attributes provide the best recommendations - and try augmenting our content-based recommendations using popularity data.
Neighborhood-Based Collaborative Filtering
Similarity between users or items is at the heart of all neighborhood-based approaches; we'll discuss how similarity measures fit into our architecture, and the effect data sparsity has on it.
We'll cover different ways of measuring similarity, including cosine, adjusted cosine, Pearson, Spearman, Jaccard, and more - and how to know when to use each one.
We'll illustrate how user-based collaborative filtering works, where we recommend stuff that people similar to you liked.
Let's write some code to apply user-based collaborative filtering to the MovieLens data set, run it, and evaluate the results.
We'll talk about the advantages of flipping user-based collaborative filtering on its head, to give us item-based collaborative filtering - and how it works.
Let's write, run, and evaluate some code to apply item-based collaborative filtering to generate recommendations from the MovieLens data set, and compare it to user-based CF.
In this exercise, you're challenged to improve upon the user-based and item-based collaborative filtering algorithms we presented, by tweaking the way candidate generation works.
Since collaborative filtering does not make rating predictions, evaluating it offline is challenging - but we can test it with hit rate metrics, and leave-one-out cross validation. Which we'll do, in this activity.
In the previous activity, we measured the hit rate of a user-based collaborative filtering system. Your challenge is to do the same for an item-based system.
Learn how the ideas of neighborhood-based collaborative filtering can be applied into frameworks based on rating predictions, with K-Nearest-Neighbor recommenders.
Let's use SurpriseLib to quickly run user-based and item-based KNN on our MovieLens data, and evaluate the results.
Try different similarity measures to see if you can improve on the results of KNN - and we'll talk about why this is so challenging.
In our next "bleeding edge alert," we'll discuss Translation-Based Recommendations - an idea unveiled in the 2017 RecSys conference for recommending sequences of events, based on vectors in item similarity space.
Matrix Factorization Methods
Let's learn how PCA allows us to reduce higher-dimensional data into lower dimensions, which is the first step toward understanding SVD.
We'll extend PCA to the problem of making movie recommendations, and learn how SVD is just a specific implementation of PCA.
Let's run SVD and SVD++ on our MovieLens movie ratings data set, and evaluate the results. They're really good!
We'll talk about some variants and extensions to SVD that have emerged, and the importance of hyperparameter tuning on SVD, as well as how to tune parameters in SurpriseLib using the GridSearchCV class.
Have a go at modifying our SVD bake-off code to find the optimal values of the various hyperparameters for SVD, and see if it makes a difference in the results.
We'll cover some exciting research from the University of Minnesota based on matrix factorization.
Introduction to Deep Learning [Optional]
A quick introduction on what to expect from this section, and who can skip it.
We'll cover the concepts of Gradient Descent, Reverse Mode AutoDiff, and Softmax, which you'll need to build deep neural networks.
We'll cover the evolution of neural networks from their origin in the 1940's, all the way up to the architecture of modern deep neural networks.
We'll use the Tensorflow Playground to get a hands-on feel of how deep neural networks operate, and the effects of different topologies.
We'll cover the mechanics of different activation functions and optimization functions for neural networks, including ReLU, Adam, RMSProp, and Gradient Descent.
We'll talk about how to prevent overfitting using techniques such as dropout layers, and how to tune your topology for the best results.
We'll walk through an example of using Tensorflow's low-level API to distribute the processing of neural networks using Python.
In this hands-on activity, we'll implement handwriting recognition on real data using Tensorflow's low-level API. Part 1 of 3.
In this hands-on activity, we'll implement handwriting recognition on real data using Tensorflow's low-level API. Part 2 of 3.
In this hands-on activity, we'll implement handwriting recognition on real data using Tensorflow's low-level API. Part 3 of 3.
Keras is a higher-level API that makes developing deep neural networks with Tensorflow a lot easier. We'll explain how it works and how to use it.
We'll tackle the same handwriting recognition problem as before, but this time using Keras with much simpler code, and better results.
There are different patterns to use in Keras for multi-class or binary classification problems; we'll talk about how to tackle each.
As an exercise challenge, develop your own neural network using Keras to predict the political parties of politicians, based just on their votes on 16 different issues.
We'll talk about how your brain's visual cortex recognizes images seen by your eyes, and how the same approach inspires artificial convolutional neural networks.
The topology of CNN's can get complicated, and there are several variations of them you can choose from for certain problems, including LeNet, GoogLeNet, and ResNet.
We'll tackle handwriting recognition again, this time using Keras and CNN's for our best results yet. Can you improve upon them?
Recurrent Neural Networks are appropriate for sequences of information, such as time series data, natural language, or music. We'll dive into how they work and some variations of them.
Training RNN's involve back-propagating through time, which makes them extra-challenging to work with.
We'll wrap up our intro to deep learning by applying RNN's to the problem of sentiment analysis, which can be modeled as a sequence-to-vector learning problem.
Deep Learning for Recommender Systems
We'll introduce the idea of using neural networks to produce recommendations, and explore whether this concept is overkill or not.
We'll cover a very simple neural network called the Restricted Boltzmann Machine, and show how it can be used to produce recommendations given sparse rating data.
We'll walk through our implementation of Restricted Boltzmann Machines integrated into our recommender framework. Part 1 of 2.
We'll walk through our implementation of Restricted Boltzmann Machines integrated into our recommender framework. Part 2 of 2.
We'll run our RBM recommender, and study its results.
You're challenged to tune the RBM using GridSearchCV to see if you can improve its results.
We'll review my results from the previous exercise, so you can compare them against your own.
We'll learn how to apply modern deep neural networks to recommender systems, and the challenges sparse data creates.
We'll walk through our code for producing recommendations with deep learning, and evaluate the results.
We'll introduce "GRU4Rec," a technique that applies recurrent neural networks to the problem of clickstream recommendations.
As a more challenging exercise that mimics what you might do in the real world, try and port some older research code into a modern Python and Tensorflow environment, and get it running.
We'll review my results from the previous exercise.
We'll explore DeepFM, which combines the strengths of Factorization Machines and of Deep Neural Networks to produce a hybrid solution that out-performs either technique.
We'll cover a few more "bleeding edge" topics, including Word2Vec, 3D CNN's for session-based recommendations, and feature extraction with CNN's.
Scaling it Up
We'll introduce Apache Spark as our first means of "scaling it up," and get it installed on your system if you want to experiment with it.
We'll explain just enough about how Spark works to let you understand how it distributes its work across a cluster, and the main objects our sample code will use: RDD's and DataFrames.
We'll start by using Spark's MLLib to generate recommendations with ALS for our ml-100k data set.
We'll scale things up, and use all of the cores on our local PC to process 20 million ratings and produce top-N recommendations with Apache Spark.
Amazon open-sourced its recommender engine called DSSTNE, which makes it easy to apply deep neural networks to massive, sparse data sets and produce great recommendations at large scale.
Watch as we use Amazon DSSTNE on an EC2 Ubuntu instance to produce movie recommendations using a deep neural network.
Let's explore how Amazon scaled DSSTNE up, paired with Apache Spark, to process their massive data and produce recommendations for millions of customers.
Amazon's SageMaker service offers some machine learning algorithms that can be used for recommendations, including factorization machines.