4.28 out of 5
4.28
41 reviews on Udemy

Apache NiFi Complete Master Course – HDP – Automation ETL

Next Gen Data Flow. Process - distribute data using powerful, reliable framework. Apache Nifi, Nifi Registry, Minifi
Instructor:
MUTHUKUMAR S
296 students enrolled
English [Auto-generated]
Apache Nifi (Niagara Files) basics to advanced concepts
Flowfile, Processor, Connections, Controller, ProcessGroup, Input - output ports, Funnel etc.,
Installation, Security, Customization, Scalability of Apache Nifi
Develop simple to complex Dataflow and take it to production
Nifi Registry - Dataflow registry
Hortonworks DataFlow HDF
Integreate with Kafka, NoSQL Database, RDBMS, File System, etc
Porcess different types of files like CSV, JSON, Text file, etc.,

Apache Nifi is next generation framework to create data pipeline and integrate with almost all popular systems in the enterprise. It has more than 250 processors and more than 70 controllers.

This course covers all all basic to advanced concepts available in Apache Nifi like

  • Flowfile

  • Controllers

  • Processors

  • Connections

  • Process Group

  • Funnel

  • Data Provenance

  • Processor relationships

  • Input and Output Ports

This course also covers on the Apache Nifi Subprojects like

  • Nifi Registry

  • MiniFi

As part of production maintenance, user may have to take cautious decision to improve the performance and handle the errors efficiently. To facilitate the same, Demo also covers on

  • Handling Throughput and Latency

  • Handling Back Pressure and Yield

  • Error handling

  • Failure Retry

  • Monitoring Bulletin

  • Data Provenance

To have seamless experience with data, handling of data latency and throughput and prioritizing the data is important. Its controlled with relationship, yield and back pressure.

Various processors and controllers to process various type of data is demonstrated.

Processors which are used in production scenarios like HTTP, RDBMS, NoSQL S3, CSV, JSON, Hive, etc., are covered in detail along with controllers like SSL, ConnectionPool, etc., with demo.

All these concepts are covered with demo and real time implementation is provided.

For easy practical purpose, all the demonstrated flowfile template is uploaded as part of the course.

Demo on creating and using KeyStore, Trust Store for SSL communication.

Using Maven and Eclipse EE for custom processor and deploying nar file to Nifi libraries.

Introduction to Apache Nifi

1
Introduction

Introduction to this course.

2
Apache Nifi Introduction
  • Basic understanding on Apache Nif Project

  • Why Apache Nifi?

  • Compare with Other ETLs.

  • Features which differentiates Apache Nifi from other ETLs

3
Dataflow Introduction - Key Features
  • Understand what is Dataflow

  • Overview on Apache Nifi UI and its features

  • Dataflow and challenges

  • Apache Nifi key features

  • Nifi role in Push and Pull architecture

4
Basic Installation
  • Install Apache Nifi in Windows

  • Start Apache Nifi

  • Open Apache Nifi UI in browser

5
Terminology Introduction
  • Get to know about various terminologies like

  1. FlowFile

  2. FlowFile Processor

  3. Connection

  4. Flow Controller

  5. Process Group

  • Create a simple workflow

  • Play with Flowfile generator


6
UI Introduction - Play with Apache Nifi User Interface

Understand various section of the UI like

  • component Toolbar

  • Global Menu

  • Search

  • Status Bar

  • Navigate Palette

  • Operate Pallette

First Baby Step - Flow file Demo

1
Create Simple Workflow
  • Create a simple Flow

  • Introduction to GetFile and PutFile Processor

  • Processor Configuration

  • Connection Configuration

  • Relationship Termination


Processors and Connections

1
Processor Category

Understand various Category types like

  • Data Ingestion

  • Routing and Mediation

  • Database Access

  • Attribute Extraction

  • System Interaction

  • Data Transformation

  • Sending Data

  • HTTP Access

  • AWS Cloud Access

2
Connection configuration

Understand various configuration option as part of connections like

  • Flowfile Expiration

  • Back Pressure

  • Object Threshold

  • Size Threshold

  • Prioritization

  • Various options in connection context menu

  • Queue monitoring

  • Using Queue Empty Options


3
Processor Configuration Settings

Various general configuration as part of Processor Settings tab

  • Penalty Duration

  • Yield Duration

  • Bulletin Level

  • Relationship Termination

4
Processor Configuration Scheduling

Various options in Processor Scheduling Tab option

  • Different Scheduling Strategy

  • Relationship between latency and throughput

  • Concurrent task configuration

  • Run Schedule configuration

  • Execution mode

5
Processor Configuration Property
  • Managing property of various processors

  • Customizing mandatory and non mandatory properties

  • Error handling on missing properties

Next Step into Flowfile

1
Working with Attributes
  • Changing the payload attributes

  • Taking decision based on flowfile attribute

  • Logging the attributes in log file

  • Monitoring log attributes

2
Log Configuration and Monitoring Logs
  • Customizing log file configuration

  • Logging attributes in separate log file

3
Handling Failures
  • Failure handling by processor

  • Retry failed Flowfile

  • Monitoring failure queue

  • Check failure message from bulletin

4
Working With Templates
  • Purpose and use of Templates

  • Creating Templates

  • Managing Templates

  • Uploading Templates

  • Template file structure

  • Handling sensitive information in Templates

Integrating Apache Nifi with Distributed Messaging System - Apache Kafka

1
Apache Kafka Quick Introduction and Demo

Understand Apache Kafka

Install Kafka

Create Topic

Publish Message to Topic

Read Message from Topic

2
Nifi As Producer
  • Create message with Flowfile

  • Post message to Kafka Topic

  • Read message using Kafka console consumer

3
Nifi As Consumer
  • Post message using Kafka Producer

  • Read message from Topic using Apache Nifi

  • Convert message to Flowfile

Process group and Funnel

1
Process group - Input and Output ports
  • Purpose of process group

  • Input and Output Ports

  • Create and Use Process groups

2
Funnel Forking
  • Create Funnel

  • Understand forking concepts and its use

  • Fork flow file to multiple processor


3
Funnel Combine
  • Understand Combine or Fan-in concept and its use

  • Combine flowfiles from multiple processors

Monitoring and Provenance

1
Nifi Monitoring and Statistics

Monitor various statistical information about

  • Processors

  • Input Ports

  • Output Ports

  • Remote Process Group

  • Connections

  • Process Groups

Observing overall Bulletin

Data Provenance

Nifi history


2
Data Provenance
  • Purpose and Usage of Data Provenance

  • Provenance data lineage

  • Detail event analysis

  • View/Download input and output claim

  • Replay / Retry events

  • Observe failed queues

  • Observe modified attributes as part of event

Structured Data Processing

1
Read MySQL Table data as Avro and JSON
  • Connect and Read data from MySQL database

  • Use of Avro and JSON

  • Using Connection Pool Controller

2
Transform CSV to JSON
  • Using AvroSchemaRegistry

  • Using CSV Reader

  • Using JSONRecordWriter

  • Using Custom Schema of CSV

  • Monitor updated attribute using data provenance

3
Managing state with MySQL and Incremental Fetch
  • Purpose of state management

  • Reading only Delta records from RDBMS table

  • Using maximum-value Column property to manage state

4
Transform CSV to JSON using dynamic schema
  • Creating sample data from mackaroo

  • Using dynamic schema

  • Purpose and realtime use of dynamic schema

Nifi Registry

1
Apache Nifi Registry - Introduction
  • Nifi - Registry Introduction

  • Purpose of Nifi Registry

  • Installing Nifi Registry

  • Staring Nifi Registry service

  • Creating and managing buckets


2
Nifi Registry as Version Control System
  • Connecting Nifi Registry with flow file

  • Adding a flowfile to a bucket

  • Maintain Version of flowfile

  • Committing and changing version of flowfile

  • Check version history of flowfile

  • Rollback changes

Nifi Cluster

1
Cluster Installation and Configuration
  • Install 3 node cluster

  • Understand Primary Node, Cluster Coordinator role and responsibility

  • Using Zookeeper Quorum and configuration

  • Configuration change of nifi.properties and state-management.xml

  • Update Zookeeper connection string

  • Starting nodes and cluster

  • Verification of primary node election

2
Cluster Flow File Demo
  • Cluster overview on Nodes, Systems, JVM, Storage and Versions

  • Create sample flow file

  • Monitor status history

  • Execution of processor in Primary node and in All nodes

Nifi and Bigdata Ecosysem

1
Nifi HDFS Interaction
  • HDFS Overview

  • HDFS Cofiguration files - hdfs-site.xml, core-site.xml

  • Configuring PutHDFS processor

  • Adding a file to HDFS

  • Browse the files in HDFS through HDFS browser

2
Nifi Hive Interaction
  • Apache Hive, HiveQL, Hive metastore overview

  • Sample HiveQL

  • Using HiveQL with HUE

  • Creating HiveConnectionPool Controller service

  • Setup connection pool, Hive configuration files, userid, password

  • Configure SelectHiveQL processor

  • Reading data from Hive and writing it as JSON file


HTTP Processors

1
HTTP Processor Introduction
  • HTTP Protocol Overview

  • Different HTTP Processors

  • Client Server architecture overview

2
GetHTTP Processor
  • GetHTTP processor introduction

  • Monitoring response in Bulletin board

  • Logging response payload

3
PostHTTP Processor and SSL Context Service setup
  • Configuring StandardSSLContext Service

  • Concepts of Key Store and Trust Store

  • Generating keypair using Keytool

  • Making secure call using https

4
ListenHTTP Processor
  • Understand ListenHTTP Processor

  • Create Http endpoint

  • Verify ListenHTTP Processor with curl

  • Verify by passing custom params and custom header

5
InvokeHTTP Processor without SSL
  • Understand InvokeHTTP Processor

  • Configure InvokeHTTP processor to connect with Google Search

  • Configure HTTP method and remote url

  • Verify response code

  • Add response as body attribute

6
InvokeHTTP with SSL
  • Understand InvokeHTTP Processor

  • Understand Private/Public key concept

  • Learn concept of Keystore, Trust store

  • Use keytool to import website public key

  • Add public key of secure website to Trust store

  • Use Trust store and configure StandardSSLContextService

  • Call secure website and verify response

Nifi and AWS

1
AWS S3 add Object with PutS3Object Processor
  • Overview on Amazon Web Services (AWS) S3 Object Storage

  • Creating a new bucket

  • Creating User, Groups and Permissions

  • Generate access key and secret access key to set in PutObject processor

  • Configure PutObject processor

  • Adding files to S3 and verifying it

2
AWS S3 list objects with ListS3 Processor
  • Configure ListS3 Processor

  • Listing files in a S3 bucket

  • Verifying the list of files with logs

3
AWS S3 add object - using AWS Controller service
  • Understand AWSCredentialsProviderControllerService and its configuration

  • Use PutS3Object processor and leverage AWSCredentialsProviderControllerService

  • Understand advantages of using controller service

Nifi and NoSQL Database

1
Mongo DB put records with Nifi
  • Understand NoSQL - MongoDB database

  • Configure MongoDB database, users and collections

  • Configure PutMongoRecord processor

  • Generate sample csv files from mockaroo website

  • Add csv files as collection in mongodB database and verify them

Nifi and Apache Solr

1
Apache Solr Introduction, Installation and Configuration
  • Introduction to Apache Solr

  • Installation of Apache Solr

  • Creating core in Solr

  • Starting and Stopping Apache Solr

  • Overview of configuration files and folder structure

  • Overview of query editor

2
Apache Solr Content Stream Demo
  • Configuring PutSolrContentStream Processor

  • Generate custom message to add to Solr

  • Add flowfiles as stream to Solr

  • View and list data added to core in Solr

Custom Processor and Custom Controller

1
Project setup with Maven and Eclipse
  • Introduction to Maven

  • Introduction to Eclipse EE

  • Creating Maven artifact

  • Overview of Maven POM file

  • Converting maven artifact to eclipse project

  • Importing project into Eclipse

  • Code Structure and Build path overview

2
Build project and generate nar file
  • Modify code for attributes

  • Include code for relation

  • Include code to handle flow file

  • Handle error in eclipse

  • Clean and build project

  • Copy nar file from target to nifi lib folder

3
Create sample workflow with custom processor and validate
  • Add custom processor to workflow

  • Add required attributes

  • Create complete workflow

  • Verify functionality of custom processor

4
Create Custom Controller
  • Generate custom controller artifact using maven

  • Overview of folder structure

  • Importing projects to eclipse to customize

  • Generate controller and controller api nar file

  • Add newly generated controller service, configure and enable it

5
Use custom controller within custom processor
  • Generate custom processor artifact

  • Copy required source of custom controller

  • Configure project and pom files to include custom controller

  • Import all project to eclipse

  • Customize the project to use custom controller with custom processor

  • Verify usage of custom controller with custom processor

Practical Use Cases

1
Use case 1 : Extract data from fordgobike, transform and store it in HDFS as CSV
  • Extract station and vehicle data from forgobike share website

  • Extract only required data and split the records

  • Transform the data to csv records from Json format

  • Learn about Json Xpath to extract data

  • Merge multiple flowfiles to a single flow file based on attribute

  • Store transformed csv file in HDFS

  • Processors used : InvokeHttp, SplitJson, EvaluateJsonPath, ReplaceText, MergeContent, PutHDFS

2
Use Case 2 : Part 1 : Extract Twitter data to Apache Solr
  • Creating Twitter app

  • Configure Consumer API keys, Access token and access token secret

  • Extract required fields from tweets

  • Store and verify twitter tweets in Apache Solr

3
Uase Case 2 : Part 2 : Visualize Twitter data using Banana Dashboard
  • Introduction to Banana dashboard

  • Configure and import dashboard

  • Creating panel and rows

  • Configure panel with different graph visualization options

Reference Resources

1
Test Data used

Bonus Lecture

1
Special coupon to join my other courses
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.3
4.3 out of 5
41 Ratings

Detailed Rating

Stars 5
17
Stars 4
16
Stars 3
6
Stars 2
1
Stars 1
1
9944e2d03a12c1f399ca6c2752477aa3
30-Day Money-Back Guarantee

Includes

5 hours on-demand video
2 articles
Full lifetime access
Access on mobile and TV
Certificate of Completion