4.8 out of 5
25 reviews on Udemy

Real World Vagrant – Automate a Cloudera Manager Build

Build a Distributed Cluster of Cloudera Manager and any number of Cloudera Manager Agent nodes with a single command!
Toyin Akin
339 students enrolled
Simply run a single command on your desktop, go for a coffee, and come back with a running distributed environment for cluster deployment
Quickly build an environment where Cloudera and Hadoop software can be installed
Ability to automate the installation of software across multiple Virtual Machines

Note : This course is built on top of the “Real World Vagrant For Distributed Computing – Toyin Akin” course

NoSQL“, “Big Data“, “DevOps” and “In Memory Database
technology are a hot and highly valuable skill to have – and this
course will teach you how to quickly create a distributed environment
for you to deploy these technologies on. 

A combination of VirtualBox and Vagrant will transform your desktop
machine into a virtual cluster. However this needs to be configured
correctly. Simply enabling multinode within Vagrant is not good enough.
It needs to be tuned. Developers and Operators within large enterprises,
including investment banks, all use Vagrant to simulate Production

After all, if you are developing against or operating a distributed
environment, it needs to be tested. Tested in terms of code deployed and
the deployment code itself.

You’ll learn the same techniques these enterprise guys use on your own Microsoft Windows computer/laptop.

Vagrant provides easy to configure, reproducible, and portable work
environments built on top of industry-standard technology and controlled
by a single consistent workflow to help maximize the productivity and
flexibility of you and your team.

This course will use VirtualBox to carve out your virtual
environment. However the same skills learned with Vagrant can be used to
provision virtual machines on VMware, AWS, or any other provider.

If you are a developer,
this course will help you will isolate dependencies and their
configuration within a single disposable, consistent environment,
without sacrificing any of the tools you are used to working with
(editors, browsers, debuggers, etc.). Once you or someone else creates a
single Vagrantfile, you just need to vagrant up and everything is
installed and configured for you to work. Other members of your team
create their development environments from the same configuration. Say
goodbye to “works on my machine” bugs.

If you are an operations engineer,
this course will help you build a disposable environment and consistent
workflow for developing and testing infrastructure management scripts.
You can quickly test your deployment scripts and more using local
virtualization such as VirtualBox or VMware. (VirtualBox for this
course). Ditch your custom scripts to recycle EC2 instances, stop
juggling SSH prompts to various machines, and start using Vagrant to
bring sanity to your life.

If you are a designer, this course will
help you with distributed installation of software in order for you to
focus on doing what you do best: design. Once a developer configures
Vagrant, you do not need to worry about how to get that software running
ever again. No more bothering other developers to help you fix your
environment so you can test designs. Just check out the code, vagrant
up, and start designing.


Here I present a curriculum as to the current state of my Cloudera courses.

My Hadoop courses are based on Vagrant so that you can practice and
destroy your virtual environment before applying the installation onto
real servers/VMs.


For those with little or no knowledge of the Hadoop eco system
Udemy course : Big Data Intro for IT Administrators, Devs and Consultants


I would first practice with Vagrant so that you can carve out a
virtual environment on your local desktop. You don’t want to corrupt
your physical servers if you do not understand the steps or make a
Udemy course : Real World Vagrant For Distributed Computing


I would then, on the virtual servers, deploy Cloudera Manager plus
agents. Agents are the guys that will sit on all the slave nodes ready
to deploy your Hadoop services
Udemy course : Real World Vagrant – Automate a Cloudera Manager Build


Then deploy the Hadoop services across your cluster (via the
installed Cloudera Manager in the previous step). We look at the logic
regarding the placement of master and slave services.
Udemy course : Real World Hadoop – Deploying Hadoop with Cloudera Manager


If you want to play around with HDFS commands (Hands on distributed file manipulation).
Udemy course : Real World Hadoop – Hands on Enterprise Distributed Storage.


You can also automate the deployment of the Hadoop services via
Python (using the Cloudera Manager Python API). But this is an advanced
step and thus I would make sure that you understand how to manually
deploy the Hadoop services first.
Udemy course : Real World Hadoop – Automating Hadoop install with Python!


There is also the upgrade step. Once you have a running cluster, how
do you upgrade to a newer hadoop cluster (Both for Cloudera Manager and
the Hadoop Services).
Udemy course : Real World Hadoop – Upgrade Cloudera and Hadoop hands on

Vagrant for Big Data Testing

Automating a Cloudera Manager Build with Vagrant

Here we try to justify the use of using vagrant to automate a Cloudera Manager build with Vagrant

Suggested course curriculum to follow ...
Suggested course curriculum to follow ...

Setup our Vagrantfile so that we can build our box templates

Base Vagrant file
Here we walk through a simple Vagrant Script

Here we walk through a simple Vagrant Script

Modify the hosts file to make it Cloudera friendly

Even though we use the vagrant hostmanager to manage the /etc/hosts file. We take control and handle the guest /etc/hosts file ourselves.

Here we download the Cloudera Manager rpms and create a local repository

In this lecture, we download the Cloudera Manager rpms and create a local repository. As we will be automating the installation of the Cloudera components, the installation will be non interactive..

Here we configure the Centos O/S. firewall, ntp, tcp buffers and swappiness

In this video, we configure the Centos O/S. firewall, ntp, tcp buffers and swappiness settings. We do as much as possible to satisfy the requirements of the best practice for tuning the O/S for Hadoop nodes.

Setup Local Webserver to house Cloudera's CDH Parcels

Here, we setup a local webserver to house Cloudera's CDH Parcels. CDH parcels hold the binaries for the Hadoop cluster. Cloudera's Parcels are alternatives to rpms.

Here we find Cloudera's Online Parcel Repository and download a Parcel

In this lecture, we find Cloudera's Online Parcel Repository and download a Parcel.

Automate the Installation Cloudera Manager and Agents

Here, we complete the Cloudera setup, by automating the Installation Cloudera Manager and Agents

Quickly validate our template by walking through the Cloudera Manager UI

Here, we quickly validate our vagrant template file by walking through the test Cloudera Manager UI

Package the Manager and Agent into varant image templates

Package the Cloudera Components. Manager and Agent

Here we create two Cloudera vagrant boxes. We export out the Manager and Agent Virtual Machines. These will become our base boxes to boot up our cluster. No need to install components anymore!

Boot up a Cluster topology using the new Cloudera vagrant base boxes

Here we boot up a Cluster topology using the new Cloudera vagrant base boxes

First pass - Deploying an Hadoop Cluster.

Here we have our first pass of deploying an Hadoop Cluster.

Second pass - Deploying an Hadoop Cluster

Here we quickly go through our final pass of installing the cluster

Bonus - Hadoop Services that require access to a database.

We we look at the issues you may face when deploying services that require access to the embedded postgres database. Services such as HIVE. We detail the solution.



Final words

You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.8 out of 5
25 Ratings

Detailed Rating

Stars 5
Stars 4
Stars 3
Stars 2
Stars 1
30-Day Money-Back Guarantee


3 hours on-demand video
2 articles
Full lifetime access
Access on mobile and TV
Certificate of Completion