3.65 out of 5
3.65
114 reviews on Udemy

Introduction to Apache Spark for Developers and Engineers

Basic to intermediate level introduction to Apache Spark that provides the main skills required to use the technology
Instructor:
Adastra Academy
575 students enrolled
English [Auto-generated]
Identify and understand the concepts of Big Data
Clearly describe Apache Spark
Understand and explain the various components of the Spark framework
Differentiate between Spark and Hadoop MapReduce
Download, install and use Spark on a local machine
Identify and understand the main Scala programming language concepts
Develop basic Spark applications
Explain and use Spark Resilient Distributed Datasets

What is Apache Spark?

Apache Spark is the next generation open source Big Data processing engine. Spark is designed to provide fast processing of large datasets and high performance for a wide range of applications. Spark enables in-memory cluster computing which greatly improves the speed of iterative algorithms and interactive data mining tasks.

Course Outcomes

‘Introduction to Apache Spark’ includes illuminating video lectures, practical hands-on Scala and Spark exercises, a guide to local installation of Spark, and quizzes. In this course, we guide students through:

  • An explanation of the Spark framework
  • The basics of programming in Scala, Spark’s native language
  • An outline of how to work with Spark’s primary abstraction, resilient distributed datasets (RDDs).

Upon completion of the course, students will be able to explain core concepts relating to Spark, understand the fundamentals of coding in Scala, and execute basic programming and data manipulation in Spark. This course will take approximately 8 hours to complete.

Recommended Experience

Programming Languages recommended for this course:

  • Scala (course exercises are in Scala)
  • Java
  • Python

Recommended for:

  • Data scientists and engineers
  • Developers
  • Individuals with a basic understanding of: Apache Hadoop, Big Data, programming languages (Scala, Java, or Python)

For students unfamiliar with Big Data and Hadoop, the course will provide a brief overview of each topic.

Why Adastra Academy?

Adastra Academy is a leading source of training and development for Information Management professionals and individuals interested in Data Management and Analytics technology. Our dedication to identifying and mastering emerging technologies guarantees our students are the first to have access to these quality courses. For an exceptional learning experience, our programs include hands-on labs and real world examples allowing students to easily apply their new knowledge.

Overview of Big Data

1
1.1 Section 1 Introduction and topics
2
1.2 Overview of Big Data and Hadoop

This lecture discusses:

  • What big data is
  • Creation history of Hadoop
  • Overview of the MapReduce model
3
1.3 Big Data Features and Traditional Datawarehousing Charactaristics

This lecture discusses:

  • Traditional data warehousing features
  • Big data features
4
1.4 Use Case: Adastra's Big Data Reference Architecture

This lecture discusses:

  • How big data tools fit into an enterprise solution
5
1.5 Section Conclusion
6
1.6 Big Data Concepts Quiz

What is Apache Spark

1
2.1 Introduction and topcis
2
2.2 Apache Spark Overview

This lecture discusses:

  • What Apache Spark is
  • Spark programming languages
  • Spark's built-in libraries
3
2.3 Spark's History

This lecture discusses:

  • Creation history of Spark
  • Spark's growth
  • Companies using Spark
4
2.4 Why Use Spark

This lecture discusses:

  • Comparison of Spark and MapReduce
  • Reasons for choosing Spark
5
2.5 Section Conclusion
6
2.6 Spark Concepts Quiz

Spark Infrastructure

1
3.1 Introduction and Topics
2
3.2 Spark Deployment Modes

This lecture discusses:

  • Spark deployment modes
    • Local stand-alone
    • Stand-alone cluster
    • Shared cluster
3
3.3 Hands-on Exercise: Installing Stand-Alone Spark
4
3.4 Hands-on Exercise: Install Stand-Alone Spark on your computer

This hands-on exercise will guide you through:

  • Installation of Scala
  • Local installation of stand-alone Apache Spark
  • Downloading of sample data used for course exercises
5
3.5 Spark Install Quiz
6
3.6 The Spark Framework

This lecture discusses:

  • Cluster managers
  • Spark core
  • Built-in libraries
7
3.7 Spark Application Concepts

This lecture discusses:

  • Driver program
  • SparkContext
  • Executors
  • Stand-alone applications
8
3.8 Section Conclusion
9
3.9 Spark Infrastructure Quiz

The Scala Programming Language

1
4.1 Introduction and topics
2
4.2 Scala Introduction & Language Features

This lecture discusses:

  • Introduction to Scala
  • Scala main features
3
4.3 Scala Language Basics-Base Types

This lecture discusses:

  • Scala base types
4
4.4 Hands-on Examples: Scala Base Types

This hands-on exercise provides practice with:

  • Scala base types
5
4.5 Scala Language Basics-Operators

This lecture discusses:

  • Scala operators
6
4.6 Hands-on Examples: Scala Operators

This hands-on exercise provides practice with:

  • Scala operators
7
4.7 Scala Language Constructs-Variables

This lecture discusses:

  • Variables in Scala
8
4.8 Hands-on Examples: Scala Variables

This hands-on exercise provides practice with:

  • Variables in Scala
9
4.9 Scala Language Constructs-Variables Quiz
10
4.10 Scala Language Constructs-Arrays

This lecture discusses:

  • Arrays in Scala
11
4.11 Hands-on Examples: Scala Arrays

This hands-on exercise gives practice with:

  • Arrays in Scala
12
4.12 Scala Language Constructs-Lists

This lecture discusses:

  • Lists in Scala
13
4.13 Hands-On Exercise: Scala Lists

This hands-on exercise provides practice with:

  • Lists in Scala
14
4.14 Scala Language Constructs-Collections

This lecture discusses:

  • Collections in Scala
15
4.15 Quiz: Scala Arrays and Lists
16
4.16 Scala Language Constructs-IF Expressions

This lecture discusses:

  • Scala IF expressions
17
4.17 Hands-On Excercise: Scala IF Expressions

This hands-on exercise provides practice with:

  • Scala IF expressions
18
4.18 Scala Language Constructs-MATCH-CASE Expressions

This lecture discusses:

  • Scala Match-case expressions
19
4.19 Hands-On Excercise: Scala MATCH-CASE Expressions

This hands-on exercise provides practice with:

  • Scala Match-case expressions
20
4.20 Scala Language Constructs-WHILE & FOR Loop Expressions

This lecture discusses:

  • Scala while loop expressions
  • Scala for loop expressions
21
4.21 Hands-On Excercise: Scala WHILE & FOR Loop Expressions

This hands-on exercise provides practice with:

  • Scala while loop expressions
  • Scala for loop expressions
22
4.22 Quiz: Scala Loops and Execution Flow
23
4.23 Scala Language Basics-Functions
This lecture discusses:
  • Functions in Scala
24
4.24 Hands-On Excercise: Scala Functions

This hands-on exercise provides practice with:

  • Functions in Scala
25
4.25 Quiz: Scala Functions: Greatest Common Divisor
26
4.26 Scala Language Basics-Anonymous Functions

This lecture discusses:

  • Anonymous function in Scala
27
4.27 Hands-on Examples: Anonymous Functions

This hands-on exercise provides practice with:

  • Anonymous functions in Scala
28
4.28 Scala Functions - Create your own function
29
4.29 Scala Functions - quiz solution
30
4.30 Section Conclusion

Resilient Distributed Datasets

1
5.1 Introduction and sections
2
5.2 Resilient Distributed Datasets-Overview

This lecture discusses:

  • What are Resilient Distributed Datasets (RDDs)?
  • Why use RDDs?
3
5.3 Resilient Distributed Datasets

This lecture discusses:

  • RDD Operations
    • Transformations
      • RDD Fault Tolerance
      • Directed Acyclic Graph
      • Lazy Evaluation
    • Actions
4
5.4 Hands-On Exercise: RDDs Lazy Evaluation & Actions

This hand-on exercise provides practice with:

  • Creating RDDs
  • Performing transformations and actions on RDDs
5
5.5 RDDs Lazy Evaluation & Actions
6
5.6 Resilient Distributed Datasets-How to Create

This lecture discusses:

  • RDD creation methods
    • Loading from an external dataset
    • Parallelizing an existing dataset
    • Creating from an existing RDD
7
5.7 Hands-On Exercise: Creating an RDD from a Collection

This hands-on exercise provides practice with:

  • Creating RDDs from a collection
8
5.8 RDD Creation
9
5.9 Pair Resilient Distributed Datasets

This lecture discusses several topics relating to RDD key/value pairs:

  • What pair RDDs are
  • Creating pair RDDs
  • Performing transformations on pair RDDs
10
5.10 Hands-On Exercise: Pair RDDs

This hands-on exercise provides practice with:

  • Creating pair RDDs
11
5.11 Pair RDDs - Joining datasets
12
5.12 Resilient Distributed Datasets-Persistence

This lecture discusses:

  • RDD persistence
    • cache() method
    • persist() method
13
5.13 Resilient Distributed Datasets-Shared Variables

This lecture discusses:

  • shuffle operations
  • shared variables
    • broadcast variables
    • accumulator variables
14
5.14 Hands-on Examples: Distributed Shared Variables

This hands-on exercise provides practice with:

  • Creating and using shared variables
15
5.15 "Advanced" data processing with Spark
16
5.16 "Advanced" data processing with Spark - quiz solution
17
5.17 Section Conclusion
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
3.7
3.7 out of 5
114 Ratings

Detailed Rating

Stars 5
41
Stars 4
39
Stars 3
25
Stars 2
6
Stars 1
3
71a0e665f24b424c599a6ee9275be195
30-Day Money-Back Guarantee

Includes

2 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion