Taming Big Data using Spark & Python

Working on Big Data Projects & writing CCA 175 Made Easy with project scenarios & Practice questions for CCA 175
Instructor:
Anshul Roy
English [Auto-generated]
Big Data and its EcoSystem like Hadoop , Sqoop, Hive, Flume, Kafka, Spark using Python, Spark SQL & Spark Streaming
Both the Concepts (Theories & Architectures) + Practicals
Assignments & Projects Scenarios for Real Projects
Practice questions for CCA 175 Certification
Process continual streams of data with Spark Streaming
Build, deploy, and run Spark scripts on Hadoop clusters
Transform structured data using SparkSQL and DataFrames

The Course is for those who do not know even ABC of Big Data and tools, want to learn them and be in a comfortable situation to implement them in projects. The course is also for those, who have some knowledge on Big Data tools, but want to enhance them further and be comfortable working in Projects. Due to the extensive scenario implementation, the course is also suitable for people interested to write Big Data Certifications like CCA 175. The course contains Practice Test for CCA 175.

The course is being provided with fully functional Big Data labs on Cloudera & Windows VMs, you need not to buy cluster very often to practice the tools. Hence, the Course is ONE TIME INVESTMENT for secure future.

In the course, we will learn how to utilize Big Data tools like Hadoop, Flume, Kafka, Spark, Scala (the most valuable tech skills on the market today).

In this course I will show you how to –

1. Use Python and Spark to analyze Big Data.

2. Practice Test for writing CCA 175 Exam is available at the end of the course.

3. Extensive and Real time project scenarios with solutions as you will write in REAL PROJECTS

4. Use Sqoop to import data from Traditional Relational Databases to HDFS & Hive.

5. Use Flume and Kafka to process streaming data

6. Use Hive to view and store data & Partition the tables

7. Use Spark Streaming to fetch the streaming data from Kafka & Flume

Big Data is the most in demand skills right now, and with this course you can learn them quickly and easily! You can also learn the components in the basic setup in files like “hdfs-site.xml”, “core-site.xml” etc  They are good to know if working for a project.

The course is focused on upskilling someone who do not know Big Data tools and target is to bring them up-to the mark to be able to work in Big Data projects seamlessly without issues.

This course comes with project scenarios and multiple datasets to work on with.

After completing this course you will feel comfortable putting Big Data, Python and Spark on your resume and also will be easily able to work and implement in projects!

Thanks and I will see you inside the course!

Use Windows/Cloudera VM provided in the course

1
Setup VM
2
WIndows HDFS Error & Fix

Learning Hadoop - Architecture, Concepts & Implementation

1
Hadoop Architecture - Part 1 - Basics of Hadoop
2
Hadoop Architecture - Part 2 - Understanding NameNode and DataNode
3
Hadoop Architecture - Part 3 - Understanding Job Tracker & Task Tracker
4
Hadoop Refresh & File Systems
5
Hadoop Terminologies & Configurations in XML Files
6
Hadoop Commands on Windows or Windows VM - Part 1
7
Hadoop Commands on Windows or Windows VM - Part 2
8
Hadoop Commands on Cloudera Quick Start VM

Learning Sqoop - Architecture, Concepts & Implementation

1
Sqoop Architecture
2
Sqoop Eval on Windows/ Windows VM
3
Sqoop Eval on Windows - Using -e & --query options
4
Sqoop List Database and List Tables - Used for creating Generic Code
5
Sqoop Import Command - Understanding and Analysing the Map-Reduce Functionality
6
Sqoop Import - Append Mode of Execution
7
Sqoop Import - Overwrite option & Different File Formats supported
8
Sqoop Import - Using Where & Columns Options to filter the data import
9
Sqoop Import - Executing User Specific Query with Where Clause
10
Sqoop Import - Incremental Load Execution
11
Sqoop Jobs - Create, List & Execute Sqoop Jobs
12
Sqoop Import All Option to Import all tables from Mysql to HDFS
13
Sqoop Import - Import from MySQL To Hive - Basic Import
14
Sqoop Import - Import from MySQL To Hive - More Options
15
Sqoop Import All - Import from MySQL to Hive using Import All
16
Sqoop Import - from Mainframe - A basic know how
17
Sqoop Export - Bring Data from HDFS to MySQL
18
Sqoop Assignment for Practice

Learning Hive - Architecture, Concepts & Implementation

1
Hive - Introduction & Features
2
Hive - Architecture & Map-Reduce Execution
3
Hive Tables
4
Hive Partitioning & Bucketing - Concepts and Difference
5
Hive Query Language - Overview and Syntax
6
Hive QL - Practicals - Create Database & Tables & load sample data
7
Hive QL - Practicals - Load Huge Data to Managed Tables
8
Hive QL - Practicals - Creating and Loading Manged & External Tables
9
Hive QL - Practicals - Partitioning in Hive
10
Hive QL - Practicals - Bucketing in Hive
11
Hive User Defined Functions
12
Hive Performance Tuning Methods

Learning Flume - Architecture, Concepts & Implementation

1
Flume - Concepts, Usage, Features & Advantages
2
Flume Architecture
3
Flume Data Flows , Contextual Routing & Other Concepts
4
Basics of Flume Configurations
5
Setup of Telnet in Windows
6
Flume Practicals - Simple Flume Job using NetCat
7
Flume Practicals - Flume Job using EXEC
8
Flume Practicals - Flume Job using Sequence Generator
9
Flume Practicals - Flume Job using Sequence Generator on HDFS
10
Flume Practicals - Flume Job using Twitter on Windows
11
Flume Practicals - Flume Job using Twitter on Cloudera
12
Flume Practicals - Flume Job using Twitter on File Channel
13
Flume Practicals - Flume Job using Twitter to Hive Sink
14
Flume Multiplexing - One Source, One Channel & Two Sink - Logger and HDFS Sinks
15
Industry Usage of Flume

Learning Kafka - Architecture, Concepts & Implementation

1
Kafka Concepts and Architecture 1
2
Kafka Concepts and Architecture 2
3
Kafka Concepts and Architecture 3
4
Kafka Sample Execution on Cloudera
5
Flume and Kafka Together

Learning Python

1
Basics of coding environment for python
2
Executing Print in CLI & Jupyter Notebook
3
Creating Variables & Indented Code in Python
4
Python Variables - Initialize, Assign & Reassign
5
Python Math Functionalities
6
Python Math Help

Learning Spark - Architecture & Concepts

1
Spark Architecture
2
Spark Components, Lazy Executions, DAG, SparkSQL ,Performance Tuning etc
3
Spark - Shuffles ,Coalesce, Repartition & Shared Variables
4
Spark Streaming Concepts & DStream
5
SPARK - RDD VS DATAFRAME VS DATASETS
6
Spark - Catalyst Optimizer and Tungsten Engine

Project Scenarios

1
Overall Big Data Project Structure
2
Project Scenario - Bring Data from BI Database to Data Lake in Layer1
3
Project Scenario 2
4
Project Scenario 3 - Bring Files from Local File System to HDFS in Data lake
5
Project Scenario 4 - Create Generic Jobs to read data from Data lake to layer 2
6
Project Scenario 5 - Use SparkSQL to read data from layer 2 and write to Layer 3
7
Project Scenario 6 - Merge MultipleFiles

Practice Test for CCA 175 Exam

1
Practice Test in PDF for CCA 175 Exam
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!

Be the first to add a review.

Please, login to leave a review
51e02856550a146e1dfc454cae9db1ec
30-Day Money-Back Guarantee

Includes

23 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion