Architecting Big Data Solutions
The Big Data phenomenon is sweeping across the IT landscape. New technologies are born, new ways of analyzing data are created and new business revenue streams are discovered every day. If you are in the IT field, Big data should already be impacting you in some way.
Building Big Data solutions is radically different from how traditional software solutions were built. You cannot take what you learnt in the traditional data solutions world and apply them verbatim to Big Data solutions. You need to understand the unique problem characteristics that drive Big Data and also become familiar with the unending technology options available to solve them.
This course will show you how Big Data solutions are built by stitching together big data technologies. It explains the modules in a Big Data pipeline, options available for each module and the Advantages, short comings and use cases for each option.
This course is great interview preparation resource for Big Data ! Any one – fresher or experienced should take this course.
Note: This is a theory course. There is no source code/ programming included.
Introduction to the course
Course outline and expectations
Traditional Data vs Big Data
How traditional data solutions are built and used
How Big Data solutions are built and used
An overview of the current trends in the big data world
Big Data Architecture
An overview of Big Data Solutions
A template for Big Data architecture - modules and their flow
Current scenario for technology options in Big Data
What are the challenges in using Big Data technologies to build today's solutions
Data Acquisition Module
Acquire module - responsibilities, what to architect and best practices
Using SQL and Flat files as acquisition options.
Using HTTP REST and real time streaming for acquiring data
Transport module - responsibilities, what to architect and best practices
Using SFTP and Apache Sqoop for building Transport modules
Using Apache Flume and Apache Kafka for building Transport modules
Persistence module - responsibilities, best practices and what to architect
Using RDBMS and HDFS to build persistence modules
Using Cassandra and MongoDB to build persistence layer in a big data solution
Using Neo4j and ElasticSearch to build persistence modules
Analyze Apache HBase and come up with list of advantages, short comings and use cases.
Transform module - responsibilities, what to architect and best practices
Transform options - Use MapReduce and SQL
Using Apache Spark and commerical ETL products to build transformation modules
Reporting module - Responsibilities, what to architect and best practices
Using Apache Impala and Spark SQL to build reporting modules
Using third party product and Elastic for building reporting modules
Advanced Analytics Module
Advanced Analytics - responsibilities, what to architect and best practices
Using R and Python for Advanced Analytics
Using Apache Spark and Commercial products for advanced analytics
Big Data Use Cases
Creating an online data backup solution with Big Data
Creating a media file store for storing large media files using Big Data
Acquiring social media data (tweets / posts) and doing real time sentiment analysis as the events happen
Doing real time credit card fraud detection on website transaction using a big data platform for data storage and predictive analytics
Building a Big Data platform that acquires log events from a farm of servers and does real time and historical operational analytics.
Developing predictive relationship models for news articles and using them to recommend items to web site users.
Building a customer 360 repository by acquiring data from multiple sources and integrating them into a single customer record
Building a big data platform to acquire car sensor data in real time and predict vehicle equipment failures and generate alarms.
Architect a Spam Classification solution using the techniques learnt in the course
Other courses to checkout and coupons