Big Data for Managers
This course covers the required fundamentals about big data technology that will help you confidently lead a big data project in your organization. It covers the big data terminology like 3 Vs of big data and key characteristics of big data technology that will help you answer the question ‘How is big data technology different from traditional technology’. You will be able to identify various big data solution stages from big data ingestion to big data visualization and security. You will be able to choose the right tool for each stage of the big data solution. You will see the examples use of popular big data tools like HDFS, Map reduce, Spark, Zeppelin etc and also a demo of setting up EMR cluster on Amazon web services. You will practice how to use the 5 P’s methodology of data science projects to manage a big data project. You will see theory as well as practice by applying it to many case studies. You will practice how to size your cluster with a template. You will explore more than 20 big data tools in the course and you will be able to choose the tool based on the big data problem.
Introduction to the course
Use the provided template to list the big data projects (or any projects) in your organization and estimate the data sizes
Big data characteristics
Down load the technology excel sheet and open it in Microsoft excel. Various technology features are listed and you need to choose whether the feature is big data or traditional technology from the drop down. Result will show correct or incorrect.
Big data storage
Download the cluster sizing template to use for your big data projects
Down load the Cluster_sizing_template excel sheet and open it in Microsoft excel. Use it to size various big data projects in your organization.
Apply what you have learned in this section to estimate various storage solutions
Big data ingestion
Apply what you have learned so far in the course to solve these big data ingestion problems.
Recap your learning by taking this quiz
Big data analytics
Big data visualization, security and vendors
Amazon web services provides a ready to use big data cluster service called Elastic Map Reduce or EMR. In this demo, I will show you how to create a big data cluster on EMR with Spark, Hadoop and Zeppelin already setup, Access Spark using Zeppelin, load data stored on S3 into Spark, apply map-reduce type of processing in Spark, access the results in Zeppelin using sql and visualize the results graphically in Zeppelin.
Big data projects
Please take this quiz to reinforce the learning from the course.
This is added on request by one of the students. In the Amazon EMR demo earlier, I have used an already existing S3 bucket. This demo shows how to create a bucket and upload a file on to Amazon S3.
On request, I have added the answers to storage exercise so that you can verify your resolutions.