4.4 out of 5
144 reviews on Udemy

Flume and Sqoop for Ingesting Big Data

Import data to HDFS, HBase and Hive from a variety of sources , including Twitter and MySQL
Loony Corn
3,282 students enrolled
English [Auto-generated]
Use Flume to ingest data to HDFS and HBase
Use Sqoop to import data from MySQL to HDFS and Hive
Ingest data from a variety of sources including HTTP, Twitter and MySQL

Taught by a team which includes 2 Stanford-educated, ex-Googlers. This team has decades of practical experience in working with Java and with billions of rows of data. 

Use Flume and Sqoop to import data to HDFS, HBase and Hive from a variety of sources, including Twitter and MySQL

Let’s parse that.

Import data : Flume and Sqoop play a special role in the Hadoop ecosystem. They transport data from sources like local file systems, HTTP, MySQL and Twitter which hold/produce data to data stores like HDFS, HBase and Hive. Both tools come with built-in functionality and abstract away users from the complexity of transporting data between these systems. 

Flume: Flume Agents can transport data produced by a streaming application to data stores like HDFS and HBase. 

Sqoop: Use Sqoop to bulk import data from traditional RDBMS to Hadoop storage architectures like HDFS or Hive. 

What’s Covered:

Practical implementations for a variety of sources and data stores ..

  • Sources : Twitter, MySQL, Spooling Directory, HTTP
  • Sinks : HDFS, HBase, Hive

.. Flume features : 

Flume Agents, Flume Events, Event bucketing, Channel selectors, Interceptors

.. Sqoop features : 

Sqoop import from MySQL, Incremental imports using Sqoop Jobs

You, This Course and Us

You, This Course and Us

Let's start with an introduction about the course, and what we'll know at the end of the course.

Why do we need Flume and Sqoop?

Why do we need Flume and Sqoop?

Let's understand Flume and Sqoop and their role in the Hadoop Ecosystem


Installing Flume

Installing Flume is pretty straightforward. 

Flume Agent - the basic unit of Flume

A Flume Agent is the most basic unit that can exist independently in Flume. An Agent is made up of Sources, Sinks and Channels.

Example 1 : Spool to Logger

Our first example of a Flume Agent using a Spooling Directory Source, a File Channel and a Logger Sink

Flume Events are how data is transported

A Flume event represents 1 record of data. Flume events consist of event headers and the event body.

Example 2 : Spool to HDFS

Learn how to use HDFS as a sink with Flume

Example 3: HTTP to HDFS

HTTP Sources can be pretty handy when you have an application capable of making POST requests.

Example 4: HTTP to HDFS with Event Bucketing

Event Headers in Flume carry useful metadata. Use event headers to bucket events in HDFS.

Example 5: Spool to HBase

Let's see how to use a HBase sink as the endpoint of the Flume Agent

Example 6: Using multiple sinks and Channel selectors

HTTP to HDFS and Logger at the same time. See how to route events using channel selectors.

Example 7: Twitter Source with Interceptors

Connect with the Twitter API using Flume. Use an Interceptor to do Regex filtering within Flume itself! 

[For Linux/Mac OS Shell Newbies] Path and other Environment Variables

If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares. 


Installing Sqoop

Install Sqoop and the connector for Sqoop to MySQL

Example 8: Sqoop Import from MySQL to HDFS
Example 9: Sqoop Import from MySQL to Hive
Example 10: Incremental Imports using Sqoop Jobs
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.4 out of 5
144 Ratings

Detailed Rating

Stars 5
Stars 4
Stars 3
Stars 2
Stars 1
30-Day Money-Back Guarantee


2 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion