3.65 out of 5
3.65
38 reviews on Udemy

Big Data Analysis with Apache Spark PySpark: Hands on Python

Learn to analyse batch, streaming data with Data Frame of Apache Spark Python and PySpark
Instructor:
Ankit Mistry
2,793 students enrolled
English [Auto-generated]
Basic overview of Spark technology
End to end Installation of Apache spark in Windows machine
End to end Installation of Apache spark in Linux machine
Setup Apache Spark Cluster on Microsoft azure HDInsight
Learn Spark SQL
Learn Spark DataFrame API
Spark Structured Streaming

Welcome to the  Apache Spark : PySpark Course.

Have you ever thought about How big company like Google, Microsoft, Facebook, Apple or Amazon Process Petabytes of data on thousands of machine.

This course starting point to learn about in memory big data analysis tool Apache Spark.

==============================================

What previous students have said: 

“Very good introduction. Ideal for beginners to obtain a big picture as a starting point. The course should be further developed and supplemented with further practical examples. But overall I would highly recommend.”     

“I like the pace at which the instructor is going. I like the fact that he quickly dives into the practical. For me, this helps to put subsequent learning into perspective. He tends to have quite a few typos, but I can overlook those and still give him a 5 star rating. I am still quite early in the. Hope to update my review as I go along.”

Great course, knowledgeable author.”

“Curso excelente para quem deseja aprender sobre Big Data e Spache Spark com PySpark.”

==================================================

Apache Spark can perform up to 100x faster than Hadoop MapReduce Data processing framework, Which makes apache spark one of most demanded skills. 

The top companies like Google, Facebook, Microsoft, Amazon, Airbnb  using Apache Spark to solve their big data problems!. Data analysis, on huge amount of data is one of the most valuable skills now a days and This course  will teach such kind of skills to complete in big data job market.

This course will teach  

  • Introduction to big data and Apache spark

  • Getting started with databricks

  • Detailed installation step on ubuntu – linux machine

  • Python Refresh for newbie

  • Apache spark Dataframe API

  • Apache spark structured streaming with end to end example

  • Basics of Machine Learning and feature engineering with Apache spark.

This course is not complete, will be adding new content related to Spark ML.

Note : This course will teach only Spark 2.0 Dataframe based API only not RDD based API. As Dataframe based API is the future of spark.

Regards

Ankit Mistry

Introduction

1
Introduction
2
Big data Overview
3
Traditional Data Storage and Processing Software vs Big data
4
Time Line of Big data and Hadoop based Eco-Systems
5
What is Apache Spark
6
Spark API Overview

Databricks

1
Getting started with Data bricks - For eager Sparker

Installation of Apache spark - Windows

1
Introduction
2
Installation Part - 1 and 2
3
Download and install anaconda
4
Installation Part - 3 and 4
5
Installation Instruction Windows

Installation of Apache spark - Ubuntu

1
Different Ways of Installation
2
Cloud Digital Ocean Setup - Installation -1
3
Python3 and Jupyter notebook Installation -2
4
Install Java, Scala, Py4j, Spark - Installation -3
5
Set Path variable and start Jupyter notebook - Installation -4
6
Installation Instruction Ubuntu

Setup Apache Spark in Cloud

1
Different cloud Provider
2
Setup Spark cluster on Microsoft Azure HDinsight

Apache Spark feature

1
Spark Timeline
2
RDD - Resilient distributed database
3
Transformation and Action

Spark Data frame API

1
Introduction
2
Spark Session
3
Spark-submit
4
Import JSON data into Dataframe
5
Define Custom schemaType
6
Data frame as SQL Table
7
Data frame Operation - 1
8
Data frame Operation - 2
9
Filter data
10
Handling Missing data
11
Dealing with datetime in Dataframe

Machine Learning

1
Introduction
2
What is Machine Learning
3
Traditional system of computing vs Machine Learning way of computing
4
Machine learning system design
5
Types of Machine Learning
6
Spark ML API overview

Feature engineering

1
Introduction
2
TF - IDF importance of term in document
3
TF-IDF code along
4
Stop Word remover and MinMax Scaler
5
More Feature engineering Technique
6
More topics

Structured Streaming

1
Introduction to Structured Streaming
2
Streaming example

Special Bonus offer

1
Discount for other course

Annex - Python Basics

1
Numbers & Math operators
2
Variables and Datatypes
3
Dynamic Typing in Python
4
String
5
Boolean variable and conditional logic
6
List
7
List Comprehensions
8
Dictionary
9
Sets and tuples
10
Looping in Python
11
Function - I
12
Function - II
13
Lambda Function
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
3.7
3.7 out of 5
38 Ratings

Detailed Rating

Stars 5
15
Stars 4
10
Stars 3
10
Stars 2
2
Stars 1
1
9018d7ebb844f8e9048f845f748ec54f
30-Day Money-Back Guarantee

Includes

6 hours on-demand video
5 articles
Full lifetime access
Access on mobile and TV
Certificate of Completion