4.14 out of 5
4.14
65 reviews on Udemy

Beginning with Machine Learning & Data Science in Python

Fundamentals of Data Science : Exploratory Data Analysis (EDA), Regression (Linear & logistic), Visualization, Basic ML
Instructor:
UNP United Network of Professionals
7,108 students enrolled
English [Auto-generated]
You will be able to apply data science algorithms for solving industry problems
You will have a clear understanding of industry standards and best practices for predictive model building
You will be able to derive key insights from data using exploratory data analysis techniques
You will be able to efficiently handle data in a structured way using Pandas
You will have a strong foundation of linear regression, multiple regression and logistic regression
You will be able to use python scikit-learn for building different types of regression models
You will be able to use cross validation techniques for comparing models, select parameters
You will know about common pitfalls in modeling like over-fitting, bias-variance trade off etc..
You will be able to regularize models for reliable predictions
85% of data science problems are solved using exploratory data analysis (EDA), visualization, regression (linear & logistic). Naturally, 85% of the interview questions comes from these topics as well.
This is a concise course created by UNP to focus on what matter most. This course will help you create a solid foundation of the essential topics of data science. With a solid foundation, you will be able to go a long way, understand any method easily, and create your own predictive analytics models.

At the end of this course, you will be able to:

  • Get your hands dirty by building machine learning models

  • Master logistic and linear regression, the workhorse of data science

  • Build your foundation for data science

  • Fast-paced course with all the basic & intermediate level concepts

  • Learn to manage data using standard tools like Pandas

This course is designed to get students on board with data science and make them ready to solve industry problems. This course is a perfect blend of foundations of data science, industry standards, broader understanding of machine learning and practical applications.

Special emphasis is given to regression analysis. Linear and logistic regression is still the workhorse of data science. These two topics are the most basic machine learning techniques that everyone should understand very well. Concepts of over fitting, regularization etc. are discussed in details. These fundamental understandings are crucial as these can be applied to almost every machine learning methods.

This course also provide an understanding of the industry standards, best practices for formulating, applying and maintaining data driven solutions. It starts off with basic explanation of Machine Learning concepts and how to setup your environment. Next data wrangling and EDA with Pandas are discussed with hands on examples. Next linear and logistic regression is discussed in details and applied to solve real industry problems. Learning the industry standard best practices and evaluating the models for sustained development comes next.

Final learning are around some of the core challenges and how to tackle them in an industry setup. This course supplies in-depth content that put the theory into practice.

Working with Machine Learning

1
Exploring Machine Learning and its Types

At the end of this lecture, you will be able to:

  • identify different types of machine learning methods.
  • learn about the role of a data scientist. 


2
Machine Learning Foundations

This quiz will assert your overview about machine learning

3
Install Anaconda

At the end of this lecture, you will be able to:

  • install Anaconda Python and ready for Python programming
  • run Python using command line interface, Spyder IDE and Jupyter notebook
  • select the right environment based on the problem in hand


4
Python Versions

About Python versions

5
Python and Jupyter Demo

At the end of this lecture, you will be able to:

  • write and run codes using Jupyter notebook
  • structure codes in cells and do markdown documentation
  • use the power of magic commands
6
Python Basics

This quiz will test your understanding on how to get started with Python programming.

Understanding Data Wrangling

1
Introduction

Introduction to section 2. At the end of this section, you will be able to :

  • load, manage and analyse data using Pandas dataframe
  • apply exploratory data analysis (EDA) methods to extract insight from the data
  • visualize the results as different type of plots
2
Reading from a CSV

At the end of this lecture you will be able to:

  • load data from CSV (comma separated values) files, or any delimiter separated files as pandas data frame
  • set the parameters for loading, like type of delimiter/separator, encoding, setting index column 
  • plot the loaded data


    3
    Selecting data and finding the most common complaint type

    This is an example based lecture. At the end of this lecture you will be able to:

    • ask business questions and get the answers using pandas data frame
    • get quick data summary from pandas data frame
    • select specific rows and columns from a pandas data frame. Such operation is also called slicing data sets.
    • generate count statistics from the data set
    • plot bar chart to visualize the result
    4
    Which borough has the most noise complaints?

    This is an example based lecture. At the end of this lecture, you will be able to:

    • filter pandas data frame by condition, also multiple conditions
    • understand the internal structure of dataframe, like series
    • extract the raw data as an array
    5
    Which weekday do people bike the most?

    This is a lecture based on example of cyclist data. At the end of this lecture, you will be able to:

    • work with copy of the data, like copy of the data after filtering by conditions
    • add new columns to the pandas data frame
    • use functions like groupby (similar to SQL) and aggregate 
    6
    Which month was the snowiest?

    This is an example based lecture. At the end of this lecture, you will be able to:

    • perform string operation on pandas dataframe column
    • use resampling from dataframe
    • create joint plots to show more than 1 attribute at a time
    7
    Cleaning Messy Data

    This lecture is based on an example. The example problem is to clean up a data set on service request. At the end of this lecture, you will be able to:

    • use string manipulation and other functionalities of pandas to identify messy data
    • clean up data set so that it is ready for use
    • turn the data cleaning routine as a function 
    8
    How to deal with timestamps

    This lecture is on handling data time using pandas. At the end of this lecture, you will be able to:

    • use the date-time data format 
    • convert between different representation of date-time data format
    • filter and sort pandas dataframe based on date-time 
    9
    Loading data from SQL databases

    At the end of this lecture, you will be able to:

    • query from a SQL databases
    • write to a SQL databases
    • create new table in the database from a pandas dataframe
    10
    Summary

    Section conclusion

    Linear Regression

    1
    Introduction

    Introduction to Section 3. This section you will learn how to build linear regression models, basic mathematics & statistics and business use cases.

    2
    What is linear regression?

    At the end of this short lecture you will see one example problem where linear regression can be used to obtain the solution. The problem is on how to decide on splitting marketing budget into different channels using the previous year's data.

    3
    The advertising dataset

    This is an example based lecture. At the end of this lecture, you will learn:

    • when to use linear regression and the difference between regression and classification problems
    • more about the ad data that will be used to created regression models
    4
    EDA questions on advertising data

    At the end of this lecture, you will be able to:

    • do a quick exploratory data analysis before starting a regression modeling exercise
    • ask the relevant business questions before starting a regression modeling exercise
    5
    Simple Linear Regression

    At the end of this lecture, you will be able to:

    • understand clearly the linear regression models and its coefficients
    • use statmodel package for linear regression, make model learn from data
    • use scikit-learn package for linear regression, make model learn from data
    • interpret the results of the modeling exercise
    • predict using the models
    6
    Hypothesis testing and p-values

    At the end of this lecture, you will be able to:

    • do diagnostic tests for a linear regression model.
    • understand p-value and how to it to accept or reject an estimated coefficient
    7
    R squared

    At the end of this lecture, you will be able to:

    • understand what is R-squared value of a model
    • access the quality of a regression model based on R-squared 
    8
    Multiple linear regression

    At the end of this lecture, you will be able to:

    • understand the multiple regression model
    • use statsmodel and scikit-learn packages to fit the multiple regression model
    • check summary of the fit
    • accept and reject coefficients based on p-value
    9
    Model and feature selection

    At the end of this lecture, you will be able to:

    • do feature selection for a multiple linear regression models
    • understand limitations of p-value and R-squared
    • use "adjusted R-squared" as an alternative to R-squared
    10
    Model evaluation

    At the end of this lecture, you will be able to:

    • do model evaluation using different metrics like MAE, MSE, RMSE
    • do model evaluation by splitting data into train and test data set
    • use .predict function to predict using the fitted model 
    11
    Handling categorical features

    At the end of this lecture, you will be able to:

    • use categorical attributes as part of linear regression model
    • understand the concept of dummy variables, and use dummyfication of attributes to build linear regression models
    12
    Summary

    At the end of this lecture you will know about few deeper concepts that are not covered in this section. You will get to see the reference from where you can learn about the concepts that were not covered.

    Logistic Regression

    1
    Introduction

    This is introduction to the section on logistic regression. In this section:

    • linear regression will be revised
    • the model of logistic regression will be introduced
    • log odds and probability will be discussed
    • how to interpret logistic regression results will be discussed
    2
    Predicting a continuous response

    This is a refresher on linear models: plotting independent and dependent variables, build linear model using scikit-learn, and plot fitted values.

    3
    Quick refresher on linear regression

    Refresher on linear models continued: how to predict from a model and how to interpret coefficients

    4
    Predicting a categorical response

    At the end of this lecture, you will be able to:

    • identify problems with categorical variables
    • transform attributes/response into categorical type
    • use linear regression as a tool for classification. Note: linear regression is sub-optimal tool for classification. You will learn about logistic regression in the next section.
    5
    Using logistic regression

    At the end of this lecture, you will be able to:

    • create a logistic regression model using scikit-learn
    • check the probabilities of belonging in each class
    6
    Probability, odds, log-odds

    At the end of this lecture, you will learn about some foundations required for logistic regression, like:

    • 'probability', 'odds' and relation between 'probability' and 'odds'
    • 'exponential function' and 'natural logarithm' and their relation
    • log of odds

    These concepts will help in understanding logistic regression in the next lecture.

    7
    What is logistic regression?

    At the end of this lecture, you will be able to:

    • understand the mathematical model of logistic regression
    • understand few key characteristics of logistic regression output
    • extend logistic regression to multi class problems (where there are more than two categories.)
    8
    Interpreting logistic regression

    At the end of this lecture, you will be able to:

    • interpret the results of a logistic regression
    • compute the probabilities using in-build scikit-learn functions
    • understand what happens when model coefficients increases or decreases
    9
    Using logistic regression with categorical features

    At the end of this lecture, you will be able to:

    • build logistic regression models using categorical variables in the features
    10
    Summary

    summary of leanings from this section

    Cross Validation

    1
    Introduction

    introduction to the section on cross validation. This is an important section because cross validation is used for parameter selection, tuning the model and compare between different machine learning models.

    2
    Train/test split

    At the end of this lecture, you will be able to:

    • split data into train and test set and use it for evaluating machine learning models
    • use scikit-learn package for splitting the data
    3
    K-fold cross-validation

    At the end of this lecture, you will be able to:

    • understand the foundation of K-fold cross validation
    • use scikit-learn methods for creating partition for K-fold cross validation
    4
    Cross-validation continued

    At the end of this lecture, you will be able to:

    • understand the advantages & disadvantages of CV and train-test split methods
    • use the best practices for implementing cross validation (CV)
    • use CV for model selection
    • use CV for feature selection
    5
    Summary

    At the end of this lecture, you will be able to:

    • use repeated cross validation technique for better results
    • keep a hold out set to out of sample validation

    You will also get a list of reference for further reading.

    Regularization

    1
    Introduction

    In this section, you will learn about regularization, the problem of over fitting, how to regularize linear / logistic regression models, and the difference between a regularized and un-regularized solutions.

    2
    Overfitting

    At the end of this lecture, you will be able to:

    • understand the concept of over fitting
    • check for the primary reasons behind over fitting
    • identify the impact of over fitting on your models


    3
    Overfitting with linear models

    At the end of this lecture, you will be able to:

    • understand characteristics of good linear models
    • check for bias, variance in a model and bias-variance trade off
    • avoid over fitting by checking from the root cause


    4
    Regularizing linear models

    At the end of this lecture, you will be able to:

    • understand the concept of regularization
    • visualize bias-variance trade off and how to control bias variance
    • get thorough understanding of Ridge and LASSO regularization
    • select between Ridge and LASSO regularization based on the problem
    • use the best practices regarding regularization
    5
    Ridge and Lasso Regularization

    This lecture is heavy on theory. At the end of this lecture, you will understand regularization from the foundations with geometric interpretation. 

    Students not interested in the theory can skip this lecture. The applications of regularization are shown in the next section.

    6
    Regularization using scikit-learn

    This is an example based lecture. At the end of this lecture, you will be able to:

    • regularize solution using Ridge and LASSO 
    • use scikit-learn modules for regularization for linear regression
    • select the regularization parameter using cross validation
    • use RidgeCV function to automatically select regularization parameter



    7
    Regularizing logistic models

    At the end of this lecture, you will be able to:

    • regularize classification problems
    • use regularization with logistic regression
    8
    Pipeline and GridSearchCV

    At the end of this lecture, you will be able to:

    • use pipeline and gridsearchCV methods of scikit-learn for cross validation
    • use automated cross validation schemes to find out the best regularization type and parameter
    9
    Comparing regularized with unregularized models

    In this lecture,  you will understand the difference between regularized solution and un-regularized solution. Their advantages and disadvantages.

    You can view and review the lecture materials indefinitely, like an on-demand channel.
    Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
    4.1
    4.1 out of 5
    65 Ratings

    Detailed Rating

    Stars 5
    27
    Stars 4
    21
    Stars 3
    16
    Stars 2
    0
    Stars 1
    1
    805a57dd54d10800b443403ca893ec5d
    30-Day Money-Back Guarantee

    Includes

    4 hours on-demand video
    Full lifetime access
    Access on mobile and TV
    Certificate of Completion