Apache Nifi is next generation framework to create data pipeline and integrate with almost all popular systems in the enterprise. It has more than 250 processors and more than 70 controllers.
This course covers all all basic to advanced concepts available in Apache Nifi like
Flowfile
Controllers
Processors
Connections
Process Group
Funnel
Data Provenance
Processor relationships
Input and Output Ports
This course also covers on the Apache Nifi Subprojects like
Nifi Registry
MiniFi
As part of production maintenance, user may have to take cautious decision to improve the performance and handle the errors efficiently. To facilitate the same, Demo also covers on
Handling Throughput and Latency
Handling Back Pressure and Yield
Error handling
Failure Retry
Monitoring Bulletin
Data Provenance
To have seamless experience with data, handling of data latency and throughput and prioritizing the data is important. Its controlled with relationship, yield and back pressure.
Various processors and controllers to process various type of data is demonstrated.
Processors which are used in production scenarios like HTTP, RDBMS, NoSQL S3, CSV, JSON, Hive, etc., are covered in detail along with controllers like SSL, ConnectionPool, etc., with demo.
All these concepts are covered with demo and real time implementation is provided.
For easy practical purpose, all the demonstrated flowfile template is uploaded as part of the course.
Demo on creating and using KeyStore, Trust Store for SSL communication.
Using Maven and Eclipse EE for custom processor and deploying nar file to Nifi libraries.
Introduction to Apache Nifi
Introduction to this course.
Basic understanding on Apache Nif Project
Why Apache Nifi?
Compare with Other ETLs.
Features which differentiates Apache Nifi from other ETLs
Understand what is Dataflow
Overview on Apache Nifi UI and its features
Dataflow and challenges
Apache Nifi key features
Nifi role in Push and Pull architecture
Install Apache Nifi in Windows
Start Apache Nifi
Open Apache Nifi UI in browser
Get to know about various terminologies like
FlowFile
FlowFile Processor
Connection
Flow Controller
Process Group
Create a simple workflow
Play with Flowfile generator
Understand various section of the UI like
component Toolbar
Global Menu
Search
Status Bar
Navigate Palette
Operate Pallette
First Baby Step - Flow file Demo
Create a simple Flow
Introduction to GetFile and PutFile Processor
Processor Configuration
Connection Configuration
Relationship Termination
Processors and Connections
Understand various Category types like
Data Ingestion
Routing and Mediation
Database Access
Attribute Extraction
System Interaction
Data Transformation
Sending Data
HTTP Access
AWS Cloud Access
Understand various configuration option as part of connections like
Flowfile Expiration
Back Pressure
Object Threshold
Size Threshold
Prioritization
Various options in connection context menu
Queue monitoring
Using Queue Empty Options
Various general configuration as part of Processor Settings tab
Penalty Duration
Yield Duration
Bulletin Level
Relationship Termination
Various options in Processor Scheduling Tab option
Different Scheduling Strategy
Relationship between latency and throughput
Concurrent task configuration
Run Schedule configuration
Execution mode
Managing property of various processors
Customizing mandatory and non mandatory properties
Error handling on missing properties
Next Step into Flowfile
Changing the payload attributes
Taking decision based on flowfile attribute
Logging the attributes in log file
Monitoring log attributes
Customizing log file configuration
Logging attributes in separate log file
Failure handling by processor
Retry failed Flowfile
Monitoring failure queue
Check failure message from bulletin
Purpose and use of Templates
Creating Templates
Managing Templates
Uploading Templates
Template file structure
Handling sensitive information in Templates
Integrating Apache Nifi with Distributed Messaging System - Apache Kafka
Understand Apache Kafka
Install Kafka
Create Topic
Publish Message to Topic
Read Message from Topic
Create message with Flowfile
Post message to Kafka Topic
Read message using Kafka console consumer
Post message using Kafka Producer
Read message from Topic using Apache Nifi
Convert message to Flowfile
Process group and Funnel
Purpose of process group
Input and Output Ports
Create and Use Process groups
Create Funnel
Understand forking concepts and its use
Fork flow file to multiple processor
Understand Combine or Fan-in concept and its use
Combine flowfiles from multiple processors
Monitoring and Provenance
Monitor various statistical information about
Processors
Input Ports
Output Ports
Remote Process Group
Connections
Process Groups
Observing overall Bulletin
Data Provenance
Nifi history
Purpose and Usage of Data Provenance
Provenance data lineage
Detail event analysis
View/Download input and output claim
Replay / Retry events
Observe failed queues
Observe modified attributes as part of event
Structured Data Processing
Connect and Read data from MySQL database
Use of Avro and JSON
Using Connection Pool Controller
Using AvroSchemaRegistry
Using CSV Reader
Using JSONRecordWriter
Using Custom Schema of CSV
Monitor updated attribute using data provenance
Purpose of state management
Reading only Delta records from RDBMS table
Using maximum-value Column property to manage state
Creating sample data from mackaroo
Using dynamic schema
Purpose and realtime use of dynamic schema
Nifi Registry
Nifi - Registry Introduction
Purpose of Nifi Registry
Installing Nifi Registry
Staring Nifi Registry service
Creating and managing buckets
Connecting Nifi Registry with flow file
Adding a flowfile to a bucket
Maintain Version of flowfile
Committing and changing version of flowfile
Check version history of flowfile
Rollback changes
Nifi Cluster
Install 3 node cluster
Understand Primary Node, Cluster Coordinator role and responsibility
Using Zookeeper Quorum and configuration
Configuration change of nifi.properties and state-management.xml
Update Zookeeper connection string
Starting nodes and cluster
Verification of primary node election
Cluster overview on Nodes, Systems, JVM, Storage and Versions
Create sample flow file
Monitor status history
Execution of processor in Primary node and in All nodes
Nifi and Bigdata Ecosysem
HDFS Overview
HDFS Cofiguration files - hdfs-site.xml, core-site.xml
Configuring PutHDFS processor
Adding a file to HDFS
Browse the files in HDFS through HDFS browser
Apache Hive, HiveQL, Hive metastore overview
Sample HiveQL
Using HiveQL with HUE
Creating HiveConnectionPool Controller service
Setup connection pool, Hive configuration files, userid, password
Configure SelectHiveQL processor
Reading data from Hive and writing it as JSON file
HTTP Processors
HTTP Protocol Overview
Different HTTP Processors
Client Server architecture overview
GetHTTP processor introduction
Monitoring response in Bulletin board
Logging response payload
Configuring StandardSSLContext Service
Concepts of Key Store and Trust Store
Generating keypair using Keytool
Making secure call using https
Understand ListenHTTP Processor
Create Http endpoint
Verify ListenHTTP Processor with curl
Verify by passing custom params and custom header
Understand InvokeHTTP Processor
Configure InvokeHTTP processor to connect with Google Search
Configure HTTP method and remote url
Verify response code
Add response as body attribute
Understand InvokeHTTP Processor
Understand Private/Public key concept
Learn concept of Keystore, Trust store
Use keytool to import website public key
Add public key of secure website to Trust store
Use Trust store and configure StandardSSLContextService
Call secure website and verify response
Nifi and AWS
Overview on Amazon Web Services (AWS) S3 Object Storage
Creating a new bucket
Creating User, Groups and Permissions
Generate access key and secret access key to set in PutObject processor
Configure PutObject processor
Adding files to S3 and verifying it
Configure ListS3 Processor
Listing files in a S3 bucket
Verifying the list of files with logs
Understand AWSCredentialsProviderControllerService and its configuration
Use PutS3Object processor and leverage AWSCredentialsProviderControllerService
Understand advantages of using controller service
Nifi and NoSQL Database
Understand NoSQL - MongoDB database
Configure MongoDB database, users and collections
Configure PutMongoRecord processor
Generate sample csv files from mockaroo website
Add csv files as collection in mongodB database and verify them
Nifi and Apache Solr
Introduction to Apache Solr
Installation of Apache Solr
Creating core in Solr
Starting and Stopping Apache Solr
Overview of configuration files and folder structure
Overview of query editor
Configuring PutSolrContentStream Processor
Generate custom message to add to Solr
Add flowfiles as stream to Solr
View and list data added to core in Solr
Custom Processor and Custom Controller
Introduction to Maven
Introduction to Eclipse EE
Creating Maven artifact
Overview of Maven POM file
Converting maven artifact to eclipse project
Importing project into Eclipse
Code Structure and Build path overview
Modify code for attributes
Include code for relation
Include code to handle flow file
Handle error in eclipse
Clean and build project
Copy nar file from target to nifi lib folder
Add custom processor to workflow
Add required attributes
Create complete workflow
Verify functionality of custom processor
Generate custom controller artifact using maven
Overview of folder structure
Importing projects to eclipse to customize
Generate controller and controller api nar file
Add newly generated controller service, configure and enable it
Generate custom processor artifact
Copy required source of custom controller
Configure project and pom files to include custom controller
Import all project to eclipse
Customize the project to use custom controller with custom processor
Verify usage of custom controller with custom processor
Practical Use Cases
Extract station and vehicle data from forgobike share website
Extract only required data and split the records
Transform the data to csv records from Json format
Learn about Json Xpath to extract data
Merge multiple flowfiles to a single flow file based on attribute
Store transformed csv file in HDFS
Processors used : InvokeHttp, SplitJson, EvaluateJsonPath, ReplaceText, MergeContent, PutHDFS
Creating Twitter app
Configure Consumer API keys, Access token and access token secret
Extract required fields from tweets
Store and verify twitter tweets in Apache Solr
Introduction to Banana dashboard
Configure and import dashboard
Creating panel and rows
Configure panel with different graph visualization options