2.5 out of 5
2.5
3 reviews on Udemy

Real World Spark 2 – Interactive Python pyspark Core

Build a Vagrant Python pyspark cluster and Code/Monitor against Spark 2 Core. The modern cluster computation engine.
Instructor:
Toyin Akin
139 students enrolled
Simply run a single command on your desktop, go for a coffee, and come back with a running distributed environment for cluster deployment
Ability to automate the installation of software across multiple Virtual Machines
Code in Python against Spark. Transformation, Actions and Spark Monitoring

Note : This course is built on top of the “Real World Vagrant – Build an Apache Spark Development Env! – Toyin Akin” course. So if you do not have a Spark environment already installed (within a VM or directly installed), you can take the stated course above.

Spark’s python shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. It is available in Python. Start it by running the following anywhere within a bash terminal within the built Virtual Machine 

pyspark

Spark’s primary abstraction is a distributed collection of items called a Resilient Distributed Dataset (RDD). RDDs can be created from collections, Hadoop InputFormats (such as HDFS files) or by transforming other RDDs

Spark Monitoring and Instrumentation

While creating RDDs, performing transformations and executing actions, you will be working heavily within the monitoring view of the Web UI.

Every SparkContext launches a web UI, by default on port 4040, that displays useful information about the application. This includes:

A list of scheduler stages and tasks
A summary of RDD sizes and memory usage
Environmental information.
Information about the running executors

Why Apache Spark …

Apache Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. Apache Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. Apache Spark can combine SQL, streaming, and complex analytics.

Apache Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Introduction to Python, Spark Core via pyspark

1
A quick tour of Python pyspark

A quick tour of Python pyspark

2
Suggested Spark Udemy curriculum courses to follow ...

Suggested Spark Udemy curriculum courses to follow. You do not need to
take/purchase the first three courses if you already have spark
installed.

Author, Equipment and Compensation

1
​My experience​ within the Enterprise

My experience within the Enterprise

2
​Spark job compensation for those in this field.

Spark job compensation for those in this field.

3
Memory Requirements
4
Recommended Hardware for Spark and Hadoop labs ...

Recommended Hardware for Spark and Hadoop labs ...

Setup the Environment

1
Resource files for the course

Resource files for the course

2
Spark setup

Spark setup

3
Walking through the Base Vagrant Spark Box

Walking through the Base Vagrant Spark Box.

4
Upgrade and Package the Vagrant Box to Spark 2

Upgrade and Package the Vagrant Box to Spark 2

5
Register the updated Vagrant Spark Box

Register the updated Vagrant Spark Box

Interact with Spark Core (Python)

1
Boot up and Walkthrough of the pyspark Python Environment

Boot up and Walkthrough of the pyspark Python Environment

2
Configure and Startup a Spark Environment for Distributed Computing

Configure and Startup a Spark Environment for Distributed Computing

3
Python Spark RDD, Transformations, Actions and Monitoring I

Python Spark RDD, Transformations, Actions and Monitoring I

4
Python Spark RDD, Transformations, Actions and Monitoring II

Python Spark RDD, Transformations, Actions and Monitoring II

5
Python Spark RDD, Transformations, Actions and Monitoring III

Python Spark RDD, Transformations, Actions and Monitoring III

6
Python Spark RDD, Transformations, Actions and Monitoring IV

Python Spark RDD, Transformations, Actions and Monitoring IV

7
Python Spark RDD, Transformations, Actions and Monitoring V

Python Spark RDD, Transformations, Actions and Monitoring V

8
Python Spark RDD, Transformations, Actions and Monitoring VI

Python Spark RDD, Transformations, Actions and Monitoring VI

9
Python Spark RDD, Transformations, Actions and Monitoring VII

Python Spark RDD, Transformations, Actions and Monitoring VII

Conclusion

1
Conclusion

Conclusion

You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
2.5
2.5 out of 5
3 Ratings

Detailed Rating

Stars 5
1
Stars 4
0
Stars 3
0
Stars 2
1
Stars 1
1
3fc801a65b1cdb046f8b046c9b4bd919
30-Day Money-Back Guarantee

Includes

3 hours on-demand video
3 articles
Full lifetime access
Access on mobile and TV
Certificate of Completion