3.72 out of 5
3.72
9 reviews on Udemy

New in Big Data: Hive, HiveMall, AWS Lambda, Solr, Kibana

Big Data ETL, Machine Learning and Data Visualization. Concise hands-on course with full code examples. Learn to excel!
Instructor:
Elena Akhmatova
130 students enrolled
craft a solution to your BigData tasks using building blocks shown in the course
create deliverables for your work in a form of an Amazon microservice, Search Engine web service, or a Dashboard
learn new technologies: AWS Lambda, Hivemall Machine Learning Library, HyperLogLog cardinality estimation technique, connecting Apache Hive to Solr and Hue, writing custom Hive UDFs, and a few other things

This course is for people who want to learn how to do things, not just to fill their heads with important concepts, paradigms, and heaps of information they kind of know but have no idea how to use. 

This course works you through the full Big Data process:

  • Data Input
  • ETL
  • Predictive Modelling using Machine Learning
  • Data Visualization 
  • Deployment to AWS using AWS Lambda and Amazon EMR bundle


Apache Hive is an easy SQL based tool that allows to process large amounts of data on Hadoop fast. Hive gained popularity immediately after Hadoop MapReduce became widely used as it allows to work with data by means of SQL queries. It is used by many organisations to process their data. This course shows a number of interesting Hive queries and explains what Hive UDFs are.

Apache HiveMall is a Machine Learning library of tomorrow. Like Hive it allows to use complex machine learning algorithms knowing SQL only. No need to code, compile and debug! It is really easy to use for programmers and non-programmers. Apache HiveMall Machine Learning library implements many useful Machine Learning algorithms (Supervised classification, LDA, RandomForest, etc.) using Hive UDFs. This course focuses on Text Classification when presenting HiveMall.

Hive + HiveMall is no less (or maybe even more) attractive and efficient than Spark + Spark MLib. Also, as HiveQL is more or less SQL. Knowing SQL and knowing only SQL will allow many non-developers to enter BigData world. 

AWS Lambda is a must to know now. I show how to use it with Java to make it suitable to be a part of a BigData pipeline. AWS Lambda + Amazon EMR + Hive combination is also explained.

Solr and Hue is a search engine and visualisation dashboard combination. ElasticSearch and Kibana is another such combination. Both technologies use the same idea: use connectors to push data from Hive or Spark directly to Solr or ElasticSearch. Hue and Kibana use properties and inner data representations of their corresponding search engines to display data on a dashboard. This course shows how to integrate Hive with both technologies.

Instead of being comprehensive this course assumes a bit of prior knowledge of the topic. It teaches by presenting solutions for the problems that occurred repeatedly during the time i worked on different BigData projects. It shows how mastering small things gives you an ability to create a simple solution to almost every problem from concept to delivery.

We start with importing data to Apache Hive correctly, and slowly progress to an ability to quickly deliver results of your work as an AWS service, a Search Engine service, or a Hue dashboard. 

The course shows data processing with Hive (also teaching how to write User Defined Functions for Hive of different levels of complexity: UDF, GenericUDF, UDAF and UDTF), it shows an application of Machine Learning to Text Classification using HiveMall, and then exporting data from Hive to Solr & Hue or ElasticSearch & Kibana. You will also learn how to write an AWS Lambda that runs Hive.   

All together that gives you an ability to build a simple data processing pipeline. A data pipeline that is simple, robust and ready to be delivered and used in no time.  

Introduction

1
Introduction

Apache Hive

1
Hive Input/Output
2
Reusing existing Java libraries in Hive

Useful Third-Party UDFs for Apache Hive

1
HyperLogLog
2
Integrating Hive and Solr
3
Row numbering and Ranking 1
4
Row numbering and Ranking 2
5
Integrating Hive and ElasticSearch

Custom HIve UDFs

1
Hive UDF Introduction
2
Hive UDF: Simple UDF
3
Hive UDF: GenericUDF
4
Hive UDF: GenericUDTF
5
Hive UDF: UDAF

Machine Learning on Hadoop

1
Introduction To HiveMall
2
Text Classification 1
3
Text Classification 2

AWS Lambda

1
ASW Lambda on Amazon
2
Simple AWS Lambda function
3
AWS Lambda as part of BigData pipeline

Visualization

1
Introduction
2
Solr
3
Hue
4
Kibana
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
3.7
3.7 out of 5
9 Ratings

Detailed Rating

Stars 5
3
Stars 4
2
Stars 3
2
Stars 2
1
Stars 1
1
56524365e10166fcd5625459cafc036d
30-Day Money-Back Guarantee

Includes

2 hours on-demand video
Full lifetime access
Access on mobile and TV
Certificate of Completion