Mapreduce framework is closest to Hadoop in terms of processing data. It is considered as atomic processing unit in Hadoop and that is why it is never going to be obsolete.
Knowing only basics of MapReduce (Mapper, Reducer etc) is not at all sufficient to work in any Real-time Hadoop Mapreduce project of companies. These basics are just tip of the iceberg in Mapreduce programming. Real-time Mapreduce is way more than that. In Live Hadoop Mapreduce projects we have to override lot many default implementations of Mapreduce framework to make them work according to our requirements.
This course is answer to the question “What concepts of Hadoop Mapreduce are used in Live projects and How to implement them in a program ?” To answer this, every Mapreduce concept in the course is explained practically via a Mapreduce program.
Every lecture in this course is explained in 2 Steps.
Step 1 : Explanation of a Hadoop component | Step 2 : Practicals – How to implement that component in a MapReduce program.
The overall inclusions and benefits of this course:
Complete Hadoop Mapreduce explained from scratch to Real-Time implementation.
Each and Every Hadoop concept is backed by a HANDS-ON Mapreduce code.
Advance level Mapreduce concepts which are even not available on Internet.
For non Java backgrounder’s help, All Mapreduce Java codes are explained line by line in such a way that even a non technical person can understand.
Mapreduce codes and Datasets used in lectures are attached for your convenience.
Includes a section ‘Case Studies‘ that are asked generally in Hadoop Interviews.
Introduction
In this first lecture of this course Introduction is given to Hadoop closest processing framework i.e. Mapreduce .
Announcement lecture to rate this Hadoop Mapreduce course.
This video explains the difference between the traditional approach and Hadoop approach to do parallel processing of big data. It shows how Hadoop handles most of the tasks by itself.
In this lecture, Basic flow of Hadoop Mapreduce program is explained.
Continuing with the above lecture, this video explains the basic flow of Hadoop Mapreduce program with an example.
This video explains the basic file input format types of Hadoop Mapreduce. There are mainly 6 Fileinput formats by default provided by Hadoop. We can directly implement them in a Mapreduce program.
Default structure of various classes in Mapreduce
This video contains a lecture explaining the default structure of a Mapper class in a Hadoop Mapreduce program.
This video contains a lecture explaining the default structure of a Reducer class in a Hadoop Mapreduce program.
This video contains a lecture explaining the default structure of a Driver class in a Hadoop Mapreduce program.
This video contains a lecture explaining the default structure of a Partitioner class in a Hadoop Mapreduce program. By default, Hadoop implements Hash Partitioner class in MapReduce program.
This is a detailed lecture on How shuffling, sorting and partitioning is done internally in Hadoop architecture. Hadoop does all these 3 steps by itself.
A step by step installation guide(Pdf) to install Hadoop and Mapreduce on your system.
Word Count program in Mapreduce
A last lecture before practicals, it explains what type of datatypes Hadoop uses.Also how to use those Hadoop datatypes in a Mapreduce program.
This lecture consists of an explanation to the basic wordcount program in Mapreduce.
After knowing the HadoopMapreduce code of word count, in this lecture it is shown how to actually write that Mapreduce code in eclipse and how to create a jar file out of it and finally how to run it on Hadoop cluster.
This lecture explains an optimization technique in Hadoop i.e. Combiner. What is combiner in Hadoop, at what phase of a Mapreduce program flow it is used and how it works.
As explained in theory, Combiner in Hadoop can give us better optimization. So in this video we will learn How to implement a Combiner class in a Mapreduce program.
Set of Mapreduce programs
This lecture explains how to take out sum of even and odd numbers using a Hadoop Mapreduce program.
Using Mapreduce program, In this video we will calculate the average of success rate of facebook ads of different categories city wise.
Hadoop provides us with predefined datatypes but it also gives us freedom to create our own datatypes in form of writables.
Thi video explains - How to create our own Hadoop recognized datatypes using a Mapreduce program
One more example of implementing Hadoop Writables by mapreduce.
In this lecture we will calculate the fraud customers of a ecommerce website. Full Mapreduce codes are attached in resources tab.
One more example of implementing Hadoop Writables by mapreduce.
In this lecture we will calculate the fraud customers of a ecommerce website. Full Mapreduce codes are attached in resources tab.
Distributed Cache, Input Split, Multiple Inputs class
This lecture explains theory of Distributed cache in Hadoop. What is Distributed Cache in Hadoop, what are its uses etc.
Using the knowledge of Distributed Cache in Hadoop, in this video we will implement it in a Mapreduce program
All the Mapreduce codes used in video are attached.
This lecture will show the default Mapreduce code of an input split class in Hadoop.
This lecture will show the default Mapreduce code of an input split class in Hadoop.
Hadoop also gives us provision to read more than 1 input files at a time so how exactly we do this in Mapreduce program is shown in this lecture.
Hadoop also gives us provision to create more than 1 Output directory and how exactly we do this in Mapreduce program is shown in this lecture.
Quiz 1
Joins in Mapreduce
A lecture to show the pseudo code flow of joins of a Mapreduce program. It explains the thinking process to do joins in Mapreduce.
This video shows How to join 2 files in a Hadoop Mapreduce program.
After performing Inner join in Mapreduce , In this lecture we will perform Outer join in mapreduce .
What is Map Join and in what scenarios it should be used in Mapreduce .
After knowing Map Join theory, We will implement it in Mapreduce program with example
Counters in Mapreduce
What are counters in Hadoop. What purpose so they serve in Hadoop architecture. What are various types of counters supported by Hadoop.
What are Job counters in Hadoop. How to check the counter status after a Mapreduce program run.
We can custom counters according to out requirements. There are 2 type of counters possible in Hadoop MApreduce framework
Static counters
Dynamic counters
In this lecture we will create a custom counters to calculate number of records processed based on a condition in a store's sales file . Mapreduce codes attached.
Creating Custom Input Formatter
What is file input format class in Hadoop Mapreduce. What are the methods present in it. What methods of it should be overridden in Mapreduce program to create own input formatter
This lecture contains a explanation to why and when there is a need to create a custom input format class in Mapreduce. Hadoop provide us option to create our own input format class to read the input file.
In this lecture we will create a custom input format class to read a XML file. There are 4 mapreduce Java classes used for this. A proper run on Hadoop cluster is also shown.
In this lecture we will create a custom input format class to read a XML file. There are 4 mapreduce Java classes used for this. A proper run on Hadoop cluster is also shown.
In this lecture we will create a custom input format class to read a XML file. There are 4 mapreduce Java classes used for this. A proper run on Hadoop cluster is also shown.
Quiz 2
Different Types of Files in Hadoop
Sequence file is one of the Key-value files supported by Hadoop. In this lecture we will learn How to read and store sequence file in Mapreduce.
Chaining in Mapreduce
Chain multiple mappers in a Mapreduce program
After chain Mapper, We will learn How to chain Multiple Mapreduce programs in single execution.
Real-Time Case Studies in Mapreduce
Hadoop Mapreduce is widely accepted by financial institutes to process the data . Once such case study is explained in this lecture where a bank is trying to find out the list of it's loyal customers.
The case study is explained by a full fledged Mapreduce program.
In this case study by using Hadoop Mapreduce ,we are predicting the churn customers | Part 1
In this case study by using Hadoop Mapreduce ,we are predicting the churn customers | Part 2
Hadoop Mapreduce case study of flight data to find out less utilized flights.
Hadoop Mapreduce case study of flight data to find out less utilized flights.