2.22 out of 5
2.22
9 reviews on Udemy

Apache Spark : Best Practices for High Performance

Explore more about how to improve the Spark queries to get low latency,high throughput in your application
Instructor:
ASHOK M
58 students enrolled
English [Auto-generated]
You will learn the best practices to be followed in Spark
Features of Spark2.0
How to improve performance of Spark sql joins
How to improve the performance of spark programs
Areas to be considered to avoid Out of memory exceptions

Apache Spark is an open source framework that provides highly generalizable methods to process data in parallel. On its own, Spark is not a data storage solution. Spark can be run locally, on a single machine with a single JVM (called local mode). More often Spark is used in tandem with a distributed storage system to write the data processed with Spark (such as HDFS, Cassandra, or S3) and a cluster manager to manage the distribution of the application across the cluster. Spark currently supports three kinds of cluster managers: the manager included in Spark, called the Standalone Cluster Manager, which requires Spark to be installed in each node of a cluster, Apache Mesos; and Hadoop YARN.

Various components of spark

Spark core

Spark Sql

Spark Streaming

Spark Mlib

Spark GraphLib

Introduction

1
Overview of Course-1
2
Overview of course-2
3
Indepth of RDD

Spark 1.6 vs Spark 2.0

1
SparkContext vs SparkSession
2
Spark-DStream

High Performance of Apache Spark

1
RDD vs DataFrame Vs DataSet in Spark
2
Performance -- Reducebykey vs Groupby key vs DataFrame
3
Avoiding Garbage collection in Spark to get more performance
4
How fast can Spark 1.6 sum up 1 billion numbers
5
How fast can Spark 2.0 sum 1 billion numbers
6
How fast can Spark 1.6 join 1 billion records
7
How fast can Spark 2.0 join 1 billion records
8
Broadcast Hash Join to speedup joins in Spark
9
Areas to be consider to avoid Out of Memory issues
10
Why Spark RDD is immutable
11
Spark with In-Memory DataGrid UseCase
12
Top 5 considerations in Production-1
13
Top 5 Considerations in Production-part2

Spark With Flume integration

1
Flume Overview
2
Flume Usecases
3
Flume usecases -part2
4
Spark with Flume Integration DEMO

Google Cloud Platform

1
Processing billions of records in GCP
2
ClickStream Data Processing Patterns using Mapreduce
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
2.2
2.2 out of 5
9 Ratings

Detailed Rating

Stars 5
0
Stars 4
1
Stars 3
3
Stars 2
1
Stars 1
4
8e2db61e83ce24858546c3f4a51eacd9
30-Day Money-Back Guarantee

Includes

3 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion