Big Data Processing using Apache Spark

Leverage one of the most efficient and widely adopted Big Data processing framework - Apache Spark
Packt Publishing
9 students enrolled
English [Auto-generated]
Understand the Spark API and its architecture
Know the difference between RDD and the DataFrame API.
Learn to join big amounts of data
Start a project using Apache Spark
Discover how to write efficient jobs using Apache Spark
Test Spark code correctly
Leverage Apache Spark to process big data more rapidly

Every year we have a big increment of data that we need to store and analyze. When we want to aggregate all data about our users and analyze that data to find insights from it, terabytes of data undergo processing. To be able to process such amounts of data, we need to use a technology that can distribute multiple computations and make them more efficient. Apache Spark is a technology that allows us to process big data leading to faster and scalable processing.

In this course, we will learn how to leverage Apache Spark to be able to process big data quickly. We will cover the basics of Spark API and its architecture in detail. In the second section of the course, we will learn about Data Mining and Data Cleaning, wherein we will look at the Input Data Structure and how Input data is loaded In the third section we will be writing actual jobs that analyze data. By the end of the course, you will have sound understanding of the Spark framework which will help you in writing the code understand the processing of big data.

About the Author

Tomasz Lelek is a Software Engineer, programming mostly in Java, Scala. He is a fan of microservices architecture, and functional programming. He has dedicated considerable time and effort to be better every day. He recently dived into Big Data technologies such as Apache Spark and Hadoop. Tomasz is passionate about nearly everything associated with software development. Recently he was a speaker at conferences in Poland – Confitura and JDD (Java Developers Day) and also at Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference.

Writing Big Data Processing Using Apache Spark

The Course Overview

This video will an overview of entire course

Overview of the Apache Spark and its Architecture

In this video, we will cover the Spark Architecture.

Start a Project Using Apache Spark, Look at build.sbt

This video focuses on creating a project.

Creating the Spark Context

This video shows the installation of spark-submit on our machine.

Looking at API of Spark

In this video we will look at the API of Spark.

Data Mining and Data Cleaning

Looking at the Input Data Structure

Thinking what problem we want to solve?

Using RDD API in the Data Mining Process

In this video, we will learn about Spark API to load data.

Loading Input Data

In this video, we will cover how to load input data.

Cleaning Input Data

In this video, we look at how to tokenizing input data

Writing Job Logic

Logic for Counting Words

This video shows how to implement counting Word Logic.

Using RDD API Transformations and Actions to Solve a Problem

In this video, we will focus on solving problems.

Testing Spark Job

This video shows how to write Robust Spark Test Suite.

Summary of Data Processing

This video shows how to start our Apache Spark job for two text books.

You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!

Be the first to add a review.

Please, login to leave a review
30-Day Money-Back Guarantee


1 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion