4.38 out of 5
4.38
8 reviews on Udemy

Big Data Intro for IT Administrators, Devs and Consultants

Grasp why "Big Data" knowledge is in hot demand for Developers / Consultants and Admins
Instructor:
Toyin Akin
266 students enrolled
Grasp why "Big Data" is the current Gold Rush for Developers / Consultants and Admins
Understand the Hadoop ECO System.
Basic HDFS / YARN / HIVE / SCOOP / SPARK will be covered with examples

Understand “Big Data” and grasp why, if you are a Developer, Database Administrator, Software Architect or a IT Consultant, why you should be looking at this technology stack

There are more job opportunities in Big Data management and Analytics than there were last year and many IT professionals are prepared to invest time and money for the training.

Why Is Big Data Different?

In the old days… you know… a few
years ago, we would utilize systems to extract, transform and load data
(ETL) into giant data warehouses that had business intelligence
solutions built over them for reporting. Periodically, all the systems
would backup and combine the data into a database where reports could be
run and everyone could get insight into what was going on.

The
problem was that the database technology simply couldn’t handle
multiple, continuous streams of data. It couldn’t handle the volume of
data. It couldn’t modify the incoming data in real-time. And reporting
tools were lacking that couldn’t handle anything but a relational query
on the back-end. Big Data solutions offer cloud hosting, highly indexed
and optimized data structures, automatic archival and extraction
capabilities, and reporting interfaces have been designed to provide
more accurate analyses that enable businesses to make better decisions.

Better
business decisions means that companies can reduce the risk of their
decisions, and make better decisions that reduce costs and increase
marketing and sales effectiveness.

What Are the Benefits of Big Data?

This infographic from Informatica walks through the risks and opportunities associated with leveraging big data in corporations.

Big Data is Timely – A large percentage of each workday, knowledge workers spend attempting to find and manage data.

Big Data is Accessible – Senior executives report that accessing the right data is difficult.

Big Data is Holistic – Information is currently kept in silos within the organization. Marketing data, for example, might be found in web analytics, mobile analytics, social analytics, CRMs, A/B Testing tools, email marketing systems, and more… each with focus on its silo.

Big Data is Trustworthy – Organizations measure the monetary cost of poor data quality. Things as simple as monitoring multiple systems for customer contact information updates can save millions of dollars.

Big Data is Relevant – Organizations are dissatisfied with their tools ability to filter out irrelevant data. Something as simple as filtering customers from your web analytics can provide a ton of insight into your acquisition efforts.

Big Data is Authoritive – Organizations struggle with multiple versions of the truth depending on the source of their data. By combining multiple, vetted sources, more companies can produce highly accurate intelligence sources.

Big Data is Actionable – Outdated or bad data results in organizations making bad decisions that can cost billions.

.

Here I present a curriculum as to the current state of my Cloudera courses.

My Hadoop courses are based on Vagrant so that you can practice and
destroy your virtual environment before applying the installation onto
real servers/VMs.

.

For those with little or no knowledge of the Hadoop eco system
Udemy course : Big Data Intro for IT Administrators, Devs and Consultants

.

I would first practice with Vagrant so that you can carve out a
virtual environment on your local desktop. You don’t want to corrupt
your physical servers if you do not understand the steps or make a
mistake.
Udemy course : Real World Vagrant For Distributed Computing

.

I would then, on the virtual servers, deploy Cloudera Manager plus
agents. Agents are the guys that will sit on all the slave nodes ready
to deploy your Hadoop services
Udemy course : Real World Vagrant – Automate a Cloudera Manager Build

.

Then deploy the Hadoop services across your cluster (via the
installed Cloudera Manager in the previous step). We look at the logic
regarding the placement of master and slave services.
Udemy course : Real World Hadoop – Deploying Hadoop with Cloudera Manager

.

If you want to play around with HDFS commands (Hands on distributed file manipulation).
Udemy course : Real World Hadoop – Hands on Enterprise Distributed Storage.

.

You can also automate the deployment of the Hadoop services via
Python (using the Cloudera Manager Python API). But this is an advanced
step and thus I would make sure that you understand how to manually
deploy the Hadoop services first.
Udemy course : Real World Hadoop – Automating Hadoop install with Python!

.

There is also the upgrade step. Once you have a running cluster, how
do you upgrade to a newer hadoop cluster (Both for Cloudera Manager and
the Hadoop Services).
Udemy course : Real World Hadoop – Upgrade Cloudera and Hadoop hands on

As a Developer, Administrator or Architect - Why should you consider "Big Data"

1
As a Developer, Administrator or Architect - Why should you consider "Big Data"

As a Developer, Administrator or Architect - Why should you consider "Big Data"

2
Suggested course curriculum to follow ...

Suggested course curriculum to follow ...

Whiteboarding Sessions

1
Whiteboarding the rational
Whiteboarding the rational
2
Part I - Whiteboarding some of the Hadoop Services
Part ! - Whiteboarding some of the Hadoop Services
3
Part II - Whiteboarding some of the Hadoop Services

Part II - Whiteboarding some of the Hadoop Services

4
​Part III - Whiteboarding some of the Hadoop Services​

Part III - Whiteboarding some of the Hadoop Services

Enterprise Examples

1
We step through an example RDBMS system
We step through an example RDBMS system used by retailers
2
Hadoop Distributors - apache.org, Cloudera, Hortonworks and MapR

We look at some Hadoop distributors - apache.org, Cloudera, Hortonworks and MapR

3
Hadoop Cloud Operators - Amazon EMR and Microsoft Azure

Here we look at some of the pro and cons for accessing Hadoop Cloud Deployments. Amazon EMR and Microsoft Azure

Hands on Hadoop Services

1
Operating a Local Hadoop Installment
Operating a Local Hadoop Installment
2
SQOOP SERVICE - Move data from the database into Hadoop

Here we target some database tables and show how we can move tables from mysql into Hadoop.

3
HIVE SERVICE - We apply sql statements within Hadoop on the copied data.

Here we use the HIVE service to provide us a logical database within Hadoop. As Hadoop can handle petabytes of data, you sure be able to image a logical database within Hadoop that can crunch petabytes of data.

4
HIVE SERVICE II - We apply sql statements within Hadoop on the copied data.

HIVE SERVICE II - We apply sql statements within Hadoop on the copied data.

5
HDFS SERVICE - We move some files into HDFS, ready for SPARK processing

HDFS SERVICE - We move some files into HDFS, ready for SPARK processing

6
SPARK SERVICE - We perform data analytics based on the data copied into HDFS
SPARK SERVICE - We perform data analytics based on the data copied into HDFS

Conclusion

1
Conclusion
Conclusion
You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.4
4.4 out of 5
8 Ratings

Detailed Rating

Stars 5
5
Stars 4
2
Stars 3
0
Stars 2
1
Stars 1
0
16f541b99dedcb16ff9e62306a8c92a1
30-Day Money-Back Guarantee

Includes

3 hours on-demand video
1 article
Full lifetime access
Access on mobile and TV
Certificate of Completion