3.25 out of 5
3.25
4 reviews on Udemy

Apache Spark Fundamentals

Get the most out of the popular Apache Spark framework to perform efficient analytics on your real-time data
Instructor:
Packt Publishing
9 students enrolled
English [Auto-generated]
History of Apache Spark and the introduction of Spark components
Learn how to get started with Apache Spark
Introduction to Apache Hadoop, it’s processed and components – HDFS, YARN and Map Reduce
Introduction of programming language – Scala, Scala fundamentals such as classes, objects in Scala, Collections in Scala, etc.
Apache Spark programming fundamentals and Resilient Distributed Datasets (RDD)
See which operations can be used to perform a transformation or action operation on the RDD
Find out how to load and save data in Spark
Write Spark application in Scala and execute it on Hadoop cluster

This video is a comprehensive tutorial to help you learn all the
fundamentals of Apache Spark, one of the trending big data processing
frameworks on the market today. We will introduce you to the various
components of the Spark framework to efficiently process, analyze, and
visualize data.

You will also get the brief introduction of Apache Hadoop and Scala
programming language before start writing with Spark programming. You
will learn about the Apache Spark programming fundamentals such as
Resilient Distributed Datasets (RDD) and See which operations can be
used to perform a transformation or action operation on the RDD. We’ll
show you how to load and save data from various data sources as
different type of files, No-SQL and RDBMS databases etc.. We’ll also
explain Spark advanced programming concepts such as managing Key-Value
pairs, accumulators etc. Finally, you’ll discover how to create an
effective Spark application and execute it on Hadoop cluster to the data
and gain insights to make informed business decisions.

By the end of this video, you will be well-versed with all the fundamentals of Apache Spark and implementing them in Spark.

About The Author

Nishant Garg has over 16 years of software architecture and
development experience in various technologies, such as Java Enterprise
Edition, SOA, Spring, Hadoop, Hive, Flume, Sqoop, Oozie, Spark, YARN,
Impala, Kafka, Storm, Solr/Lucene, NoSQL databases (such as HBase,
Cassandra, and MongoDB), and MPP databases (such as GreenPlum).

He received his MS in software systems from the Birla Institute of
Technology and Science, Pilani, India, and is currently working as a
senior technical architect for the Big Data R&D Labs with Impetus
Infotech Pvt. Ltd. Previously, Nishant has enjoyed working with some of
the most recognizable names in IT services and financial industries,
employing full software life cycle methodologies such as Agile and
SCRUM.

Nishant has also undertaken many speaking engagements on big data
technologies and is also the author of Learning Apache Kafka & HBase
Essestials, Packt Publishing.

Introducing Spark

1
The Course Overview

This video provides an overview of the entire course.

2
Spark Introduction

What are the origins of Apache Spark and what are its uses?           

3
Spark Components

What are the various components in Apache Spark?           

Hadoop and Spark

1
Introduction to Hadoop

This video explains the complete historical journey of project Nutch to Apache Hadoop—how the project Hadoop was started, what were the research papers that influenced the Spark project, and so on. In the end, various goals achieved by developing Hadoop are explained.

2
Hadoop Processes and Components

In this video, we are going to look at the Apache Hadoop background running JVM processes—name node, data node, resource manager, and node manager. It also provides an overview of Hadoop components—HDFS, YARN, and Map Reduce programming mode.            

3
HDFS and YARN

This video shares more details about Hadoop components Hadoop distributed filesystem—Goals, HDFS components, and the working of HDFS. It also explains another Hadoop component YARN—components, lifecycle, and its use cases.            

4
Map Reduce

This video provides an overview of Map Reduce—the Hadoop programming model and its execution behavior at various stages.         

Scala from 30,000 feet

1
Introduction to Scala

The aim of this video is to introduce the Scala language and its features, and by the end of this video, you should be able to get started with Scala.

2
Scala Programming Fundamentals

The aim of this video is to explain the fundamentals of Scala Programming, such as Scala classes, fields, methods, and the different types of arguments, such as default and named arguments passed to class constructors and methods.            

3
Objects in Scala

The aim of this video is to explain the objects in Scala language, singleton object in Scala, and outline the usages of objects in Scala applications. It also describes companion objects.            

4
Collections

The aim of this video is to explain the structure of the Scala collections hierarchy. Look at the examples of different collection types, such as Array, Set, and Map. It also covers how to apply functions to data in collections and outlines the basics of structural sharing.            

Spark Programming

1
Spark Execution

The aim of this video is to start your learning of Apache Spark fundamentals. It introduces you to the Spark component architecture and how different components are stitched together for Spark execution.

2
Understanding RDD

The aim of this video is to take the first step towards Spark programming. It explains the Spark Context and also shares the need of Resilient Distributed Datasets called RDD. It also explains the execution approach change in Map Reduce due to RDD. 

3
RDD Operations

The aim of this video is to explain the operations that can be applied on RDDs. These operations are in the form of transformations and actions. It explains various operations under both the categories with examples.            

Advanced Spark Programming

1
Loading and Saving Data in Spark

The aim of this video is to explain and demonstrate data loading and storing in Spark from different file types; such as text, CSV, JSON file, and sequence file; different filesystems, such as local filesystem, Amazon S3, and HDFS; and different databases, such as My SQL, Postgres, HBase, and so on.           

2
Managing Key-Value Pairs

The aim of this video is to explain the motivations behind key-value-based RDD and the creation of such RDDs. Next, it explains the various transformations and actions that can be applied on key-value-based RDD. Finally, it explains data partitioning techniques in Spark.

3
Accumulators

The aim of this video is to explain a few more advance concepts, such as accumulators, broadcast variables, and passing data to external programs using pipes.

4
Writing a Spark Application

The aim of this video is to demonstrate the writing of Spark jobs using Eclipse-based Scala IDE, creating Spark job JAR files, and, finally, copying and executing the Spark job on Hadoop cluster.

You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
3.3
3.3 out of 5
4 Ratings

Detailed Rating

Stars 5
1
Stars 4
1
Stars 3
1
Stars 2
0
Stars 1
1
979d0e1b352dc9433be869f185a585c4
30-Day Money-Back Guarantee

Includes

2 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion