This course is basically intended for users who are interested to learn about Hadoop technologies.
The training includes Complete practical Training on Big Data HADOOP-MapReduce, PIG and HIVE with real time applications. You can develop your complete Project in Big Data HADOOP using MapReduce, PIG and HIVE, because sample Data and sample code is also available with this course.
Big Data and HADOOP (Concept)
Understanding MapReduce and HADOOP Installation (Concept)
Apache PIG (Concept)
Project: Banking and finance domain project using MapReduce, PIG and HIVE
Project: Analyze Loan Dataset
Industry: Banking and Finance
Data: Publicly available dataset which contains complete details of all the loans issued, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information.
Problem Statement:
1. Calculate overall average risks
2. Calculate average risk per location
3. Calculate average risk per categories/loan type
4. Calculate average risk per location and Category
Input Data Files are attached here
Calculate Average Risk using MapReduce
class Mapper
{
setup()
{
}
byteof set key
line value
cleanup()
{
}
}
class reducer
{
setup()
{
}
input key = null
input value = risk
count
sum
cleanup()
{
avg = sum(risk)/count
}
}
Calculate Average Risk using MapReduce per Location
Calculate Average Risk using MapReduce per Category
Banking And Finance Domain Analysis using Pig
Environment setup and Import Data using Sqoop
Banking And Finance Domain Analysis using HIVE.
Sample Input Data and Sample Code.
MyDriver.java
------------------------------------------------------------------------
package BankingAvrageRisk;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MyDriver
{
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException
{
// TODO Auto-generated method stub
Configuration conf = new Configuration();
//problem -2
//conf.set("LocSearch", args[0]);
Job job = new Job(conf,"Calculate Avg Risk per Category/Loan Type");
job.setJarByClass(MyDriver.class);
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(DoubleWritable.class);
job.setReducerClass(MyReducer.class);
job.setNumReduceTasks(1);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
--------------------------------------------------------------------
MyMapper.java
-------------------------------------------------------------------
package BankingAvrageRisk;
import java.io.IOException;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MyMapper extends Mapper<LongWritable,Text,Text,DoubleWritable> {
public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException
{
String line = value.toString();
String[] linePart = line.split(",");
//problem -1 -avg risk
Double risk = Double.parseDouble(linePart[7]);
//problem-2 - avg risk per location
//String loc = linePart[8].toString();
//problem-3 - avg risk per category
String cat = linePart[2].toString().substring(1, 3);
if(cat.equals("HL"))
{
cat = "Home Loan";
}
else if(cat.equals("PL"))
{
cat = "Personal Loan";
}
else if(cat.equals("VL"))
{
cat = "Viechel Loan";
}
else
{
cat = "Retailer Loan";
}
con.write(new Text(cat), new DoubleWritable(risk));
}
}
----------------------------------------------------------------------
MyReducer.java
---------------------------------------------------------------------
package BankingAvrageRisk;
import java.io.IOException;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MyReducer extends Reducer<Text,DoubleWritable,Text,DoubleWritable> {
double glSum = 0;
int glcount = 0;
//double avg = 0;
public void reduce(Text key,Iterable<DoubleWritable> val,Context con) throws IOException, InterruptedException
{
for (DoubleWritable values : val) {
glSum = glSum + values.get();
glcount = glcount +1;
}
con.write(new Text(key), new DoubleWritable(glSum/glcount));
}
/*public void cleanup(Context con) throws IOException, InterruptedException
{
avg = glSum/glcount;
con.write(new Text(""), new DoubleWritable(avg));
}*/
}
Sample Input Data Sets are attached here
copy these content from pdf and save as java files