Big Data Internship Program – Data Ingestion-Sqoop and Flume
This course is a part of “Big data Internship Program” which is aligned to a typical Big data project life cycle stage.
This course is focused on the Ingestion in Big data .
Our Course is divided into two part 1) Technical Knowledge with examples and 2) Work on project
- Big Data ingestion’s concept and means
- Sqoop concept and feature.
- Good understanding of sqoop tools with arguments
- Flume concept and configuration
- Flume features: Multiplexing,Flume Agents,Interceptors etc .
- Understanding of different File Format supported by Hadoop
- Get the access to our private GitHub repository
- Build the first part of Our Recommendation Book project using sqoop and flume
In this video, we have explained what is data ingestion, How to process data, challenges in data ingestion, the key function of data ingestion.
This part -1 course is focused on the foundation of Big data . It covers technical items like
- Refresh your knowledge on Unix
- Java based on usage into Big Data .
- Understand git /github which is used by most of the companies for source control
- Hadoop Installation
Part - 1 is free here
In this video, we have explained what is data ingestion and there tools available in markets.
In this video, we have explained data ingestion tools Kafka, Chukwa,Storm etc.
Different types of File Formats in Hadoop
This video shows different type of file format supported in Hadoop.
CSV /Text files are quite common and often used for exchanging data between Hadoop and external systems.
This video shows that Sequence files store data in a binary format with a similar structure to CSV. Like CSV, sequence files do not store metadata with the data so the only schema evolution option is appending new fields.
Avro files are quickly becoming the best multi-purpose storage format within Hadoop. Avro files store metadata with the data but also allow specification of an independent schema for reading the file. Here we show you all about this file format .
RC Files or Record Columnar Files were the first columnar file format adopted in Hadoop. Like columnar databases, the RC file enjoys significant compression and query performance benefits.ORC Files or Optimized RC Files were invented to optimize performance in Hive and are primarily backed by HortonWorks. This video shows about these two file format.
Parquet Files are yet another columnar file format that originated from Hadoop creator Doug Cutting’s Trevni project. Like RC and ORC, Parquet enjoys compression and query performance benefits, and is generally slower to write than non-columnar file formats. In this video you can learn more about this file format .
In this video, we have explained to you what is sqoop, what is flume, sqoop work flow, sqoop architecture.
In this video, we have explained what is import command, how sqoop import command is executed.
In this video we have explained how to execute commands in terminal,how to get table list, how to get list of data bases, how to import data in hdfs.
In this video, we have explained how to run sqoop commands, what is structure of sqoop commands, what are the parameters used in the execution of sqoop commands.
In this video we have explained what is sqoop export, and how it is used.
In this video, we have explained what is sqoop jobs how it used and when it is used. how to create jobs, how to list sqoop jobs available.
In this video we have explained what is incremental sqoop, and how it works.what are the incremental import parameters etc.
In this video, we have explained how incremental import works, how to append data to the table.
In this video, we have explained what is flume, and where it is used.difference between flume and sqoop.
In this video, we have explained how flume works, what is flume agent what are the components of flume agent, how data is flow between various components of the flume.
In this video, we have explained what are components of the flume, how they are configured i.e how flume agent is configured.
In this video, we have explained how to run flume agent. and get a result.
In this video, we have explained what is multi-agent flume, what is the consolidation of flume.
In this video, we have explained what is multiplexing,use of multiplexing, channel selector etc.
In this video, we have tried to explain what is an interceptor, why it is used, how it is configured, and how this runs. what are types of interceptors?
In this video, we have tried to explain what is Recommendation with the help of book recommendation concepts.
In this video, we have shown you how to load data in MySQL and then how to import data in hdfs. through sqoop commands.
In this video, we have explained what is a script,how we can execute our job by using the shell script.
In Video, we have shown how book recommendation is working, how the rating is generated in hdfs through the flume.