4 out of 5
4
2 reviews on Udemy

Learn Apache Spark with Python

A Complete Guide and Integration of Apache Spark Framework and Python Programming
Introduction to Pyspark
Filtering RDDs
Install and run Apache Spark on a desktop computer or on a cluster
Understand how Spark SQL lets you work with structured data
Understanding Spark with Examples and many more

Apache Spark is the hottest Big Data skill today. More and more organizations are adapting Apache Spark for building their big data processing and analytics applications and the demand for Apache Spark professionals is sky rocketing. Learning Apache Spark is a great vehicle to good jobs, better quality of work and the best remuneration packages. 

You might already know Apache Spark as a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. It’s well-known for its speed, ease of use, generality and the ability to run virtually everywhere. And even though Spark is one of the most asked tools for data engineers, also data scientists can benefit from Spark when doing exploratory data analysis, feature extraction, supervised learning and model evaluation. 

The course will cover many more topics of Apache Spark with Python including-

  • What makes Spark a power tool of Big Data and Data Science?

  • Learn the fundamentals of Spark including Resilient Distributed Datasets, Spark Actions and Transformations

  • Explore Spark SQL with CSV, JSON and mySQL (JDBC) data sources

  • Convenient links to download all source code

Module 1 Introduction to Spark with Python

1
Introduction to the Module
2
What is PySpark
3
PySpark in Industry
4
Why to go for Python

Module 2 Introduction to Big Data and Hadoop

1
Big Data Overview
2
Facts about Big Data
3
Big Data Scenarios
4
Apache Hadoop Framework
5
Top Hadoop Users
6
HISTORY OF HADOOP
7
Difference between RDBMS and Hadoop
8
Cluster Modes in Hadoop
9
Hadoop Ecosystem
10
HDFS Daemons and Mapreduce daemons
11
HADOOP CLUSTER ARCHITECTURE
12
Top Reasons Why you should Learn Hadoop
13
Hadoop distributions and compatibilities
14
Hadoop Ecosystem in Detail
15
Hadoop Distributed File System
16
HDFS Files and Blocks
17
HDFS components and architecture
18
HDFS File Read and Write

Module 3 Apache Spark Framework

1
Batch and Real Time Analytics
2
Why Spark when Hadoop is Already there
3
Introduction to Apache Spark
4
Features of Apache Spark
5
Users and Use Cases of Apache Spark
6
Job Execution Flow and Spark Execution
7
Spark Unified Stack
8
Complete Picture of Apache Spark
9
Apache spark Architecture
10
Top Companies Using Spark

Module 4 Python Programming Language

1
Getting Started with Python
2
Introduction to Python
3
Advantages and facts about python
4
First python program
5
Program execution and python IDE
6
Built in types in python
7
Numbers Data Type in Python
8
String and List Data Type
9
Dictionary, Tuples and Sets
10
Variables and assignment
11
Hands-On
12
Hands-On
13
Hands-On
14
Hands-On
15
Hands-On

Module 5 Advanced Part of Apache Spark with Python

1
Downloading and Installing Enthought Canopy
2
Downloading and Installing jdk
3
Downloading and Installing Spark
4
Downloading and Setup of winutils
5
Setting up Environment Variables
6
Running the first Spark Program
7
Downloading and Extracting movie ratings datasets
8
Running Ratings Counter Spark Program
9
Understanding key value pairs with an example
10
Filtering RDD using an example
11
Finding maximum temperature by location
12
Map vs FlatMap
13
Understanding FlatMap using Word Count example
14
Sorting the word count results
15
Total Amount Spent Example
16
Sorting the Total Amount Spent Example result

Module 6 Deep Dive Into Spark with Python

1
Most popular movie example
2
Understanding Broadcast Variables with an example
3
Finding Similar Movies Example
4
Finding Most Popular Superhero example
5
Superhero Degrees of Separation Part1
6
Superhero Degrees of Separation Part 2

Module 7 SparkSQL in Apache Spark with Python

1
Executing SQL commands
2
Using SQL style functions instead of queries
3
Using DataFrames instead of RDDs

Module 8 MLib in Apache Spark with Python

1
Using MLlib to produce movie recommendations
2
Using Dataframe with MLlib using an example
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4
4 out of 5
2 Ratings

Detailed Rating

Stars 5
0
Stars 4
2
Stars 3
0
Stars 2
0
Stars 1
0
119cef6964104b50dbe134eaeb07e5f5
30-Day Money-Back Guarantee

Includes

8 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion