At the end of this course, you will be able to:

Get your hands dirty by building machine learning models

Master logistic and linear regression, the workhorse of data science

Build your foundation for data science

Fast-paced course with all the basic & intermediate level concepts

Learn to manage data using standard tools like Pandas

This course is designed to get students on board with data science and make them ready to solve industry problems. This course is a perfect blend of foundations of data science, industry standards, broader understanding of machine learning and practical applications.

Special emphasis is given to regression analysis. Linear and logistic regression is still the workhorse of data science. These two topics are the most basic machine learning techniques that everyone should understand very well. Concepts of over fitting, regularization etc. are discussed in details. These fundamental understandings are crucial as these can be applied to almost every machine learning methods.

This course also provide an understanding of the industry standards, best practices for formulating, applying and maintaining data driven solutions. It starts off with basic explanation of Machine Learning concepts and how to setup your environment. Next data wrangling and EDA with Pandas are discussed with hands on examples. Next linear and logistic regression is discussed in details and applied to solve real industry problems. Learning the industry standard best practices and evaluating the models for sustained development comes next.

Final learning are around some of the core challenges and how to tackle them in an industry setup. This course supplies in-depth content that put the theory into practice.

### Working with Machine Learning

At the end of this lecture, you will be able to:

- identify different types of machine learning methods.
- learn about the role of a data scientist.

This quiz will assert your overview about machine learning

At the end of this lecture, you will be able to:

- install Anaconda Python and ready for Python programming
- run Python using command line interface, Spyder IDE and Jupyter notebook
- select the right environment based on the problem in hand

About Python versions

At the end of this lecture, you will be able to:

- write and run codes using Jupyter notebook
- structure codes in cells and do markdown documentation
- use the power of magic commands

This quiz will test your understanding on how to get started with Python programming.

### Understanding Data Wrangling

Introduction to section 2. At the end of this section, you will be able to :

- load, manage and analyse data using Pandas dataframe
- apply exploratory data analysis (EDA) methods to extract insight from the data
- visualize the results as different type of plots

At the end of this lecture you will be able to:

- load data from CSV (comma separated values) files, or any delimiter separated files as pandas data frame
- set the parameters for loading, like type of delimiter/separator, encoding, setting index column
- plot the loaded data

This is an example based lecture. At the end of this lecture you will be able to:

- ask business questions and get the answers using pandas data frame
- get quick data summary from pandas data frame
- select specific rows and columns from a pandas data frame. Such operation is also called slicing data sets.
- generate count statistics from the data set
- plot bar chart to visualize the result

This is an example based lecture. At the end of this lecture, you will be able to:

- filter pandas data frame by condition, also multiple conditions
- understand the internal structure of dataframe, like series
- extract the raw data as an array

This is a lecture based on example of cyclist data. At the end of this lecture, you will be able to:

- work with copy of the data, like copy of the data after filtering by conditions
- add new columns to the pandas data frame
- use functions like groupby (similar to SQL) and aggregate

This is an example based lecture. At the end of this lecture, you will be able to:

- perform string operation on pandas dataframe column
- use resampling from dataframe
- create joint plots to show more than 1 attribute at a time

This lecture is based on an example. The example problem is to clean up a data set on service request. At the end of this lecture, you will be able to:

- use string manipulation and other functionalities of pandas to identify messy data
- clean up data set so that it is ready for use
- turn the data cleaning routine as a function

This lecture is on handling data time using pandas. At the end of this lecture, you will be able to:

- use the date-time data format
- convert between different representation of date-time data format
- filter and sort pandas dataframe based on date-time

At the end of this lecture, you will be able to:

- query from a SQL databases
- write to a SQL databases
- create new table in the database from a pandas dataframe

Section conclusion

### Linear Regression

Introduction to Section 3. This section you will learn how to build linear regression models, basic mathematics & statistics and business use cases.

At the end of this short lecture you will see one example problem where linear regression can be used to obtain the solution. The problem is on how to decide on splitting marketing budget into different channels using the previous year's data.

This is an example based lecture. At the end of this lecture, you will learn:

- when to use linear regression and the difference between regression and classification problems
- more about the ad data that will be used to created regression models

At the end of this lecture, you will be able to:

- do a quick exploratory data analysis before starting a regression modeling exercise
- ask the relevant business questions before starting a regression modeling exercise

At the end of this lecture, you will be able to:

- understand clearly the linear regression models and its coefficients
- use statmodel package for linear regression, make model learn from data
- use scikit-learn package for linear regression, make model learn from data
- interpret the results of the modeling exercise
- predict using the models

At the end of this lecture, you will be able to:

- do diagnostic tests for a linear regression model.
- understand p-value and how to it to accept or reject an estimated coefficient

At the end of this lecture, you will be able to:

- understand what is R-squared value of a model
- access the quality of a regression model based on R-squared

At the end of this lecture, you will be able to:

- understand the multiple regression model
- use statsmodel and scikit-learn packages to fit the multiple regression model
- check summary of the fit
- accept and reject coefficients based on p-value

At the end of this lecture, you will be able to:

- do feature selection for a multiple linear regression models
- understand limitations of p-value and R-squared
- use "adjusted R-squared" as an alternative to R-squared

At the end of this lecture, you will be able to:

- do model evaluation using different metrics like MAE, MSE, RMSE
- do model evaluation by splitting data into train and test data set
- use .predict function to predict using the fitted model

At the end of this lecture, you will be able to:

- use categorical attributes as part of linear regression model
- understand the concept of dummy variables, and use dummyfication of attributes to build linear regression models

At the end of this lecture you will know about few deeper concepts that are not covered in this section. You will get to see the reference from where you can learn about the concepts that were not covered.

### Logistic Regression

This is introduction to the section on logistic regression. In this section:

- linear regression will be revised
- the model of logistic regression will be introduced
- log odds and probability will be discussed
- how to interpret logistic regression results will be discussed

This is a refresher on linear models: plotting independent and dependent variables, build linear model using scikit-learn, and plot fitted values.

Refresher on linear models continued: how to predict from a model and how to interpret coefficients

At the end of this lecture, you will be able to:

- identify problems with categorical variables
- transform attributes/response into categorical type
- use linear regression as a tool for classification. Note: linear regression is sub-optimal tool for classification. You will learn about logistic regression in the next section.

At the end of this lecture, you will be able to:

- create a logistic regression model using scikit-learn
- check the probabilities of belonging in each class

At the end of this lecture, you will learn about some foundations required for logistic regression, like:

- 'probability', 'odds' and relation between 'probability' and 'odds'
- 'exponential function' and 'natural logarithm' and their relation
- log of odds

These concepts will help in understanding logistic regression in the next lecture.

At the end of this lecture, you will be able to:

- understand the mathematical model of logistic regression
- understand few key characteristics of logistic regression output
- extend logistic regression to multi class problems (where there are more than two categories.)

At the end of this lecture, you will be able to:

- interpret the results of a logistic regression
- compute the probabilities using in-build scikit-learn functions
- understand what happens when model coefficients increases or decreases

At the end of this lecture, you will be able to:

- build logistic regression models using categorical variables in the features

summary of leanings from this section

### Cross Validation

introduction to the section on cross validation. This is an important section because cross validation is used for parameter selection, tuning the model and compare between different machine learning models.

At the end of this lecture, you will be able to:

- split data into train and test set and use it for evaluating machine learning models
- use scikit-learn package for splitting the data

At the end of this lecture, you will be able to:

- understand the foundation of K-fold cross validation
- use scikit-learn methods for creating partition for K-fold cross validation

At the end of this lecture, you will be able to:

- understand the advantages & disadvantages of CV and train-test split methods
- use the best practices for implementing cross validation (CV)
- use CV for model selection
- use CV for feature selection

At the end of this lecture, you will be able to:

- use repeated cross validation technique for better results
- keep a hold out set to out of sample validation

You will also get a list of reference for further reading.

### Regularization

In this section, you will learn about regularization, the problem of over fitting, how to regularize linear / logistic regression models, and the difference between a regularized and un-regularized solutions.

At the end of this lecture, you will be able to:

- understand the concept of over fitting
- check for the primary reasons behind over fitting
- identify the impact of over fitting on your models

At the end of this lecture, you will be able to:

- understand characteristics of good linear models
- check for bias, variance in a model and bias-variance trade off
- avoid over fitting by checking from the root cause

At the end of this lecture, you will be able to:

- understand the concept of regularization
- visualize bias-variance trade off and how to control bias variance
- get thorough understanding of Ridge and LASSO regularization
- select between Ridge and LASSO regularization based on the problem
- use the best practices regarding regularization

This lecture is heavy on theory. At the end of this lecture, you will understand regularization from the foundations with geometric interpretation.

Students not interested in the theory can skip this lecture. The applications of regularization are shown in the next section.

This is an example based lecture. At the end of this lecture, you will be able to:

- regularize solution using Ridge and LASSO
- use scikit-learn modules for regularization for linear regression
- select the regularization parameter using cross validation
- use RidgeCV function to automatically select regularization parameter

At the end of this lecture, you will be able to:

- regularize classification problems
- use regularization with logistic regression

At the end of this lecture, you will be able to:

- use pipeline and gridsearchCV methods of scikit-learn for cross validation
- use automated cross validation schemes to find out the best regularization type and parameter

In this lecture, you will understand the difference between regularized solution and un-regularized solution. Their advantages and disadvantages.