Are you looking to gain in-depth knowledge of machine learning and deep learning? If yes, then this Learning Path just right for you.
Packt’s Video Learning Paths are a series of individual video products put together in a logical and stepwise manner such that each video builds on the skills learned in the video before it.
R is one of the leading technologies in the field of data science. Starting out at a basic level, this Learning Path will teach you how to develop and implement machine learning and deep learning algorithms using R in real-world scenarios.
The Learning Path begins with covering some basic concepts of R to refresh your knowledge of R before we deep-dive into the advanced techniques. You will start with setting up the environment and then perform data ETL in R. You will then learn important machine learning topics, including data classification, regression, clustering, association rule mining, and dimensionality reduction. Next, you will understand the basics of deep learning and artificial neural networks and then move on to exploring topics such as ANNs, RNNs, and CNNs. Finally, you will learn about the applications of deep learning in various fields and understand the practical implementations of scalability, HPC, and feature engineering.
By the end of the Learning Path, you will have a solid knowledge of all these algorithms and techniques and be able to implement them efficiently in your data science projects.
Do not worry if this seems too far-fetched right now; we have combined the best works of the following esteemed authors to ensure that your learning journey is smooth:
About the Authors
Selva Prabhakaran is a data scientist with a large e-commerce organization. In his 7 years of experience in data science, he has tackled complex real-world data science problems and delivered production-grade solutions for top multinational companies.
Yu-Wei, Chiu (David Chiu) is the founder of LargitData, a startup company that mainly focuses on providing Big Data and machine learning products. He has previously worked for Trend Micro as a software engineer, where he was responsible for building Big Data platforms for business intelligence and customer relationship management systems. In addition to being a startup entrepreneur and data scientist, he specializes in using Spark and Hadoop to process Big Data and apply data mining techniques for data analysis.
Vincenzo Lomonaco is a deep learning PhD student at the University of Bologna and founder of ContinuousAI, an open source project aiming to connect people and reorganize resources in the context of continuous learning and AI. He is also the PhD students’ representative at the Department of Computer Science of Engineering (DISI) and teaching assistant of the courses machine learning and computer architectures in the same department.
Mastering R Programming
This video gives an overview of the entire course.
In this video, we will take a look at how to perform univariate analysis.
The goal of this video is to perform bivariate analysis in R using three cases.
In this video, we will see how to detect and treat outliers.
The goal of this video is to see how to treat missing values in R.
In this video we'll see what is linear regression, its purpose, when to use it, and how to implement in R.
We'll see how to interpret regression results and Interaction effects in this video
In this video we will discuss what is residual analysis and detect multivariate outliers using Cook's Distance
The goal of this video is to understand how to do model selection and comparison using best subsets, stepwise regression and ANOVA.
In this video we will see how to do k-fold cross validation in R.
The goal of this video is check out how to build non-linear regression models using Splines and GAMs.
Our goal in this video would be to understand logistic regression, evaluation metrics of binary classification problems, and interpretation of the ROC curve.
In this video, we will understand the concept and working of naïve Bayes classifier and how to implement the R code.
In this video, we will look at what k-nearest neighbors algorithms, how does it works and how to implement it in T.
The goal of this video is to understand how decision trees work, what they are used for, and how to implement then.
The goal of this video is know what the various features of the caret package are and how to build predictive models.
The goal of this video is to know how to do feature selection before building predictive models.
In this video, we will look at how support vector machines work.
In this video, we will look at the concept behind bagging and random forests and how to implement it to solve problems.
Let's understand what boosting is and how stochastic gradient boosting works with GBM.
In this video, we will look at what regularization is, ridge and lasso regression, and how to implement it.
Let's look at how XG Boost works and how to implement it in this video.
Our goal in this video would be to reduce the dimensionality of data with principal components, and understand the concept and how to implement it in R.
In this video, we will understand the k-means clustering algorithm and implement it using the principal components.
In this video, we will analyze the clustering tendency of a dataset and identify the ideal number of clusters or groups.
The goal of this video is to understand the logic of hierarchical clustering, types, and how to implement it in R.
How to use affinity propagation to cluster data points? How is it different from conventional algorithms?
How to build recommendation engines to recommend products/movies to new and existing users?
The goal of this video is to understand what a time series is, how to create time series of various frequencies, and the enhanced facilities available in the xts package.
The goal of this video is to understand the characteristics of a time series: stationarity and how to de-trend and de-seasonalize a time series.
In this video, we will introduce the characteristics of time series such as ACF, PACF, and CCF; why they matter; and how to interpret them.
Our goal in this video would be to understand moving average and exponential smoothing and use it to forecast.
In this video, we will understand how double exponential smoothing and holt winter forecasting works, when to use them, and how to implement them in R.
Let's look at what ARIMA forecasting is, understand the concepts, and learn how ARIMA modelling works in this video.
In this video, we'll take a look at how to scrape data from web pages and how to clean and process raw web and other textual data.
Our goal in this video is to know how to process texts using tm package and understand the significance of TF-IDF and its implementation. Finally, we see how to draw a word cloud in R.
Let's see how to use cosine similarity and latent semantic analysis to find and map similar documents.
In this video, we will see how to extract the underlying topics in a document, the keywords related to each topic and the proportion of topics in each document.
Let's check out how to perform sentiment analysis and scoring in R.
How to classify texts with machine learning algorithms using the RTextTools package?
The goal of this videos is to understand what is the basic structure of to make charts with ggplot, how to customize the aesthetics, and manipulate the theme elements.
In this video, we will see how to manipulate the legend the way we want and how to add texts and annotation in ggplot.
The goal of this video is to understand how to plot multiple plots in the same chart and how to change the layouts of ggplot.
How to make various types of plots in ggplot such as bar chart, time series, boxplot, ribbon chart,and so on.
In this video, we will understand what the popular ggplot extensions are, and where to find them, and their applications.
We will discuss the best practices that should be followed to minimize code runtime in this video.
Let's tackle the implementation of parallel computing in R.
The goal of this video is understand how to work with DplyR and pipes.
In this video, we will discuss how to manipulate data with the data.table package, how to achieve maximum speed, and what the various features of data.table are.
Our main focus in this video is to understand how to write C++ code and make it work in R. Also leverage the speed of C++ in R, interface Rcpp with R, and write Rcpp code.
We'll take a look at the components of an R package in this video.
In this video, we will look at how to create an R Package so that it can be submitted to CRAN.
We will understand the mandatory checks and common problems faced by developers when creating R packages in this video.
The goal of this video is to show how to submit an R package to CRAN.
R Machine Learning solutions
This is give you brief information about the course.
R must be first installed on your system to work on it.
RStudio makes the process of development with R easier.
R packages are an essential part of R as they are required in all our programs. Let's learn to do that.
You must know how to give data to R to work with data. You will learn that here.
Data manipulation is time consuming and hence needs to be done with the help of built-in R functions.
R is widely used for statistical applications. Hence it is necessary to learn about the built in functions of R.
To communicate information effectively and make data easier to comprehend we need graphical representation. You will learn to plot figures in this section.
Because of some limitations, it is a good practice to get data from external repositories. You will be able to do just that after this video.
Reading a dataset is the first and foremost step in data exploration. We need to learn to how to do that.
In R, since nominal, ordinal, interval, and ratio variable are treated differently in statistical modeling, we have to convert a nominal variable from a character into a factor.
Missing values affect the inference of a dataset. Thus it is important to detect them.
After detecting missing values, we need to impute them as their absence may affect the conclusion.
After imputing the missing values, you should perform an exploratory analysis to summarize the data characteristics.
The exploratory analysis helps users gain insights into how single or multiple variables may affect the survival rate. However, it does not determine what combinations may generate a prediction model. We need to use a decision tree for that.
After constructing the prediction model, it is important to validate how the model performs while predicting the labels.
Another way of measuring performance is the ROC curve.
When there are huge datasets, we can find the characteristics of the entire dataset with a part or sample of the data. Hence data sampling is essential.
Probability distribution and statistics are interdependent. To provide a justification to the statistical information, we need probability.
Univariate statistics deals with a single variable and hence is very simple.
To analyze the relation among more than two variables, multivariate analysis is done.
Assessing the relation between dependent and independent variables is carried out through linear regression.
To validate that the experiment results are significant, hypothesis testing is done.
To compare means of two different groups, one- and two-sample t-tests are conducted.
Comparing a sample with a reference probability or comparing cumulative distributions of two data sets calls for a Kolmogorov- Smirnov test.
The Wilcoxon Test is a non-parametric test for null hypothesis.
To check the distribution of categorical variables of two groups, Pearson's chi-squared test is used.
To examine the relation between categorical independent variables and continuous dependent variables, Anova is used. When there is a single variable, one-way ANOVA is used.
When there are two categorical values to be compared, two-way ANOVA is used.
Linear regression is the simplest model in regression and can be used when there is one predictor value.
To obtain summarized information of a fitted model, we need to learn how to summarize linear model fits.
It would be really convenient for us if we could predict unknown values. You can do that using linear regression.
To check if the fitted model adequately represents the data, we perform diagnostics.
In the case of a non-linear relationship between predictor and response variables, a polynomial regression model is formed. We need to fit the model. This video will enable you to do that.
An outlier will cause diversion from the slope of the regression line. In order to avoid that, we need to fit a robust linear regression model.
We will perform linear regression on a real-life example, the SLID dataset.
GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
GLM allows response variables with error distribution other than a normal distribution. We apply the Poisson model to see how that is done.
When a variable is binary, we apply the binomial model.
GAM has the ability to deal with non-linear relationships between dependent and independent variables. We learn to fit a regression using GAM.
Visualizing a GAM helps it to understand better.
You can also diagnose a GAM model to analyze it.
Training and testing datasets are both essential for building a classification model.
A partitioning tree works on the basis of split condition starting from the base node to the terminal node.