4.12 out of 5
4.12
459 reviews on Udemy

CCA 175 – Spark and Hadoop Developer – Python (pyspark)

Cloudera Certified Associate Spark and Hadoop Developer using Python as Programming Language
Entire curriculum of CCA Spark and Hadoop Developer
Apache Sqoop
HDFS Commands
Python Fundamentals
Core Spark - Transformations and Actions
Spark SQL and Data Frames
Streaming analytics using Kafka, Flume and Spark Streaming

CCA 175 Spark and Hadoop Developer is one of the well recognized Big Data certification. This scenario based certification exam demands basic programming using Python or Scala along with Spark and other Big Data technologies.

This comprehensive course covers all aspects of the certification using Python as programming language.

  • Python¬†Fundamentals

  • Core Spark – Transformations and Actions

  • Spark SQL and Data Frames

  • File formats

  • Flume, Kafka and Spark Streaming

  • Apache Sqoop

Exercises will be provided to prepare before attending the certification. Intention of the course is to boost the confidence to attend the certification.  

All the demos are given on our state of the art Big Data cluster. You can avail one week complementary lab access by filling this form which is provided as part of the welcome message.

Introduction

1
CCA 175 Spark and Hadoop Developer - Curriculum
2
Using labs for preparation
3
Setup Development Environment (Windows 10) - Introduction
4
Setup Development Environment - Python and Spark - Pre-requisites
5
Setup Development Environment - Python Setup on Windows
6
Setup Development Environment - Configure Environment Variables
7
Setup Development Environment - Setup PyCharm for developing Python applications
8
Setup Development Environment - Pass run time arguments or parameters
9
Setup Development Environment - Download Spark compressed tar ball
10
Setup Development Environment - Install 7z for uncompress and untar on windows
11
Setup Development Environment - Setup Spark
12
Setup Development Environment - Install JDK
13
Setup Development Environment - Configure environment variables for Spark
14
Setup Development Environment - Install WinUtils - integrate Windows and HDFS
15
Setup Development Environment - Integrate PyCharm and Spark on Windows 10

Python Fundamentals

1
Introduction and Setting up Python
2
Basic Programming Constructs
3
Functions in Python
4
Python Collections
5
Map Reduce operations on Python Collections
6
Setting up Data Sets for Basic I/O Operations
7
Basic I/O operations and processing data using Collections
8
Get revenue for given order id - as application

Getting Started

1
Setup Environment - Options
2
Setup Environment - Locally
3
Setup Environment - using Cloudera Quickstart VM
4
Using Itversity platforms - Big Data Developer labs and forum
5
Using itversity's big data labs
6
Using Windows - Putty and WinSCP
7
Using Windows - Cygwin
8
HDFS Quick Preview
9
YARN Quick Preview
10
Setup Data Sets

Data ingestion using Sqoop

1
Introduction and Objectives
2
Accessing Sqoop Documentation
3
Preview of MySQL on labs
4
Sqoop connect string and validating using list commands
5
Run queries in MySQL using eval
6
Sqoop Import - Simple Import
7
Sqoop Import - Execution Life Cycle
8
Sqoop Import - Managing Directories
9
Sqoop Import - Using split by
10
Sqoop Import - Different file formats
11
Sqoop Import - Using compression
12
Sqoop Import - Using Boundary Query
13
Sqoop Import - columns and query
14
Sqoop Import - auto reset to one mapper
15
Sqoop Import - Delimiters and handling nulls
16
Sqoop Import - Incremental Loads
17
Sqoop Import - Hive - Create Hive Database
18
Sqoop Import - Hive - Simple Hive Import
19
Sqoop Import - Hive - Managing Hive tables
20
Sqoop Import - Import all tables
21
Role of Sqoop in typical data processing life cycle
22
Sqoop Export - Simple export with delimiters
23
Sqoop Export - Understanding export behaviour
24
Sqoop Export - Column Mapping
25
Sqoop Export - Update and Upsert
26
Sqoop Export - Stage Tables

Apache Spark 1.6 - Transform, Stage and Store

1
Introduction
2
Introduction to Spark
3
Setup Spark on Windows
4
Quick overview about Spark documentation
5
Connecting to the environment
6
Initializing Spark job using pyspark
7
Create RDD from HDFS files
8
Create RDD from collection - using parallelize
9
Read data from different file formats - using sqlContext
10
Row level transformations - String Manipulation
11
Row Level Transformations - map
12
Row Level Transformations - flatMap
13
Filtering data using filter
14
Joining Data Sets - Introduction
15
Joining Data Sets - Inner Join
16
Joining Data Sets - Outer Join
17
Aggregations - Introduction
18
Aggregations - count and reduce - Get revenue for order id
19
Aggregations - reduce - Get order item with minimum subtotal for order id
20
Aggregations - countByKey - Get order count by status
21
Aggregations - understanding combiner
22
Aggregations - groupByKey - Get revenue for each order id
23
groupByKey - Get order items sorted by order_item_subtotal for each order id
24
Aggregations - reduceByKey - Get revenue for each order id
25
Aggregations - aggregateByKey - Get revenue and count of items for each order id
26
Sorting - sortByKey - Sort data by product price
27
Sorting - sortByKey - Sort data by category id and then by price descending
28
Ranking - Introduction
29
Ranking - Global Ranking using sortByKey and take
30
Ranking - Global using takeOrdered or top
31
Ranking - By Key - Get top N products by price per category - Introduction
32
Ranking - By Key - Get top N products by price per category - Python collections
33
Ranking - By Key - Get top N products by price per category - using flatMap
34
Ranking - By Key - Get top N priced products - Introduction
35
Ranking - By Key - Get top N priced products - using Python collections API
36
Ranking - By Key - Get top N priced products - Create Function
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.1
4.1 out of 5
459 Ratings

Detailed Rating

Stars 5
196
Stars 4
160
Stars 3
62
Stars 2
22
Stars 1
20
8df635f21c80150ae9a11298d0b1e90a
30-Day Money-Back Guarantee

Includes

32 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion