4.49 out of 5
4.49
159 reviews on Udemy

The Ultimate Apache Spark with Java Course – Hands On!

Learn how to slice and dice data using the next generation big data platform - Apache Spark!
Instructor:
Imtiaz Ahmad
1,101 students enrolled
English [Auto-generated]
Utilize the most powerful big data batch and stream processing engine to solve big data problems
Master the new Spark Java Datasets API to slice and dice big data in an efficient manner
Build, deploy and run Spark jobs on the cloud and bench mark performance on various hardware configurations
Optimize spark clusters to work on big data efficiently and understand performance tuning
Transform structured and semi-structured data using Spark SQL, Dataframes and Datasets
Implement popular Machine Learning algorithms in Spark such as Linear Regression, Logistic Regression, and K-Means Clustering

Newly Launched Oct 2018

Apache Spark is the next generation batch and stream processing engine. It’s been proven to be almost 100 times faster than Hadoop and much much easier to develop distributed big data applications with. It’s demand has sky rocketed in recent years and having this technology on your resume is truly a game changer. Over 3000 companies are using Spark in production right now and the list is growing very quickly!  Some of the big names include: Oracle, Hortonworks, Cisco, Verizon, Visa, Microsoft, Amazon as well as most of the big world banks and financial institutions! 

In this course you’ll learn everything you need to know about using Apache Spark in your organization while using their latest and greatest Java Datasets API.  Below are some of the things you’ll learn:

  • How to develop Spark Java Applications using Spark SQL Dataframes

  • Understand how the Spark Standalone cluster works behind the scenes

  • How to use various transformations to slice and dice your data in Spark Java

  • How to marshall/unmarshall Java domain objects (pojos) while working with Spark Datasets

  • Master joins, filters, aggregations and ingest data of various sizes and file formats (txt, csv, Json etc.)

  • Analyze over 18 million real-world comments on Reddit to find the most trending words used

  • Develop programs using Spark Streaming for streaming stock market index files

  • Stream network sockets and messages queued on a Kafka cluster

  • Learn how to develop the most popular machine learning algorithms using Spark MLlib

  • Covers the most popular algorithms: Linear Regression, Logistic Regression and K-Means Clustering

You’ll be developing over 15 practical Spark Java applications crunching through real world data and slicing and dicing it in various ways using several data transformation techniques. This course is especially important for people who would like to be hired as a java developer or data engineer because Spark is a hugely sought after skill. We’ll even go over how to setup a live cluster and configure Spark Jobs to run on the cloud. You’ll also learn about the practical implications of performance tuning and scaling out a cluster to work with big data so you’ll definitely be learning a ton in this course. This course has a 30 day money back guarantee. You will have access to all of the code used in this course.

Introduction

1
Why Spark
2
Spark High Level Components
3
Creating a Spark Maven Project
4
Import Source Code into Eclipse
5
First Spark Application
6
Spark Standalone Cluster Architecture

Spark Java Dataset API Basics

1
Ingesting CSV and JSON Files
2
How to reduce logging in the console
3
Real World Dataframes Example
4
Union Dataframes and Other Set Transformations
5
Converting Between Datasets and Dataframes

Diving Deeper with Datasets, Dataframes, Transformations and the DAG

1
Map and Reduce Transformation Functions
2
Using Datasets with User Defined POJOs
3
Using Datasets with Unstructured Textual Data
4
Joining Dataframes and Using Various Filter Transformations
5
Aggregation Transformations + Join Assignment
6
More on Transformations, Actions and the DAG

Running Spark Jobs on the Cloud

1
Using Spark to Analyze Reddit Comments
2
Running the Reddit Spark Application on an EMR Cluster
3
Instructions for Configuring a Spark Stand-alone Cluster

Spark Streaming Applications

1
Streaming Network Socket Example
2
Stock Market Files Streaming Example
3
Using Kafka with Spark Streaming

Machine Learning with Spark MLlib

1
Machine Learning Resources
2
Overview of Linear Regression
3
Spark Java Linear Regression Example
4
Overview of Logistic Regression
5
Spark Java Logistic Regression (Classification Algorithm)
6
Overview of K-Means Clustering
7
Spark Java K-Means Clustering Example
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.5
4.5 out of 5
159 Ratings

Detailed Rating

Stars 5
83
Stars 4
65
Stars 3
9
Stars 2
1
Stars 1
1
8a617e57a4f11a0047bb71aa65dcb628
30-Day Money-Back Guarantee

Includes

7 hours on-demand video
4 articles
Full lifetime access
Access on mobile and TV
Certificate of Completion