4.5 out of 5
21 reviews on Udemy

Streaming Analytics on Google Cloud Platform

The Fifth Course in a Series for Attaining the Google Certified Data Engineer
Mike West
301 students enrolled
English [Auto-generated]
You'll understand the core structures of Apache Beam.
You'll know how to author a simple streaming application on Google's Cloud.
You'll be well versed in all the vernacular of streaming.
You'll be ready to handle all the questions on the Google Certified Data Engineering exam that are related to Cloud Dataflow.

Review from course in this series

“I like the detail, especially highlighting the specifics of the test. The detail makes this course worth the investment including the summary at the end and the quizzes that test my knowledge.”  — Valentina Kibuyaga

Welcome to Streaming Analytics on Google Cloud Platform This is the Fifth and final course in a series of courses designed to help you attain the coveted Google Certified Data Engineer. 

Additionally, the series of courses is going to show you the role of the data engineer on the Google Cloud Platform

While this is a short course the topic matter is dense and while you won’t have to author is Java Pipelines for the exam you will need to know a lot about how they are created and executed. 

At this juncture, the Google Certified Data Engineer is the only real world certification for data and machine learning engineers.

NOTE: This is NOT a course on programming Apache Beam Pipelines. This is a very targeted course on understanding how Apache Beam and Cloud Dataflow provide us with an infrastructure to build pipelines for streaming data. The course will provide the learner with the nomenclature and process understanding they’ll need to pass the Certified Data Engineering Exam. 

Streaming data processing is a big deal in big data these days, and for good reasons. Businesses crave ever more timely data, and switching to streaming is a good way to achieve lower latency.

The massive, unbounded data sets that are increasingly common in modern business are more easily tamed using a system designed for such never-ending volumes of data.

Processing data as it arrives spreads workloads out more evenly over time, yielding more consistent and predictable consumption of resources.

In Google Cloud Platform the main tool we use for building these pipelines Cloud Dataflow. The product itself is a fusion of the code written by Google developers and that of the Apache foundation. The project that came out of that business cohabitation is Apache Beam.

Apache Beam (Batch + strEAM) is a model and set of APIs for doing both batch and streaming data processing. It was open-sourced by Google (with Cloudera and PayPal) in 2016 via an Apache incubator project.

In this course, we are going to learn about Apache Beam and Cloud Dataflow. While the course is an entry level course streaming will be new to many. Like most of my other courses in this series, I’ll attempt to break down more complicated topics pictorially.

                                                             *Five Reasons to take this Course.*

1) You Want to be a Data Engineer 

It’s the number one job in the world. (not just within the computer space) The growth potential career wise is second to none. You want the freedom to move anywhere you’d like. You want to be compensated for your efforts. You want to be able to work remotely. The list of benefits goes on. 

2) The Google Certified Data Engineer 

Google is always ahead of the game. If you were to look back at a timeline of their accomplishments in the data space you might believe they have a crystal ball. They’ve been a decade ahead of everyone.  Now, they are the first and the only cloud vendor to have a data engineering certification. With their track record I’ll go with Google. 

3) The Growth of Data is Insane 

Ninety percent of all the world’s data has been created in the last two years. Business around the world generate approximately 450 billions transactions a day. The amount of data collected by all organizations is approximately 2.5 Exabytes a day. That number doubles every month. 

4) Apache Beam in Plain English

Apache Beam pipelines require basic programming skills. The Google Certified Data Engineering exam will require you are able to identify the parts of a Beam Pipeline in addition to understanding some of the vernacular and nuances behind streaming data.

5) You want to be ahead of the Curve 

The data engineer role is new.  While you’re learning, building your skills and becoming certified you are also the first to be part of this burgeoning field.  You know that the first to be certified means the first to be hired and first to receive the top compensation package. 

Thank you for your interest in Streaming Analytics on Google Cloud Platform and we will see you in the course!!



In this first lesson let's learn what this course is about. 

It's our introduction to Apache Beam and Cloud Dataflow for this course. 

Is this Course for You?

I want you to take my course but I want the course to be right for you. 

In this lesson let's learn if you are part of the course's target audience. 

What is Streaming?

In this lesson let's high level what streaming is. 

Let's define streaming and a few terms we will use throughout the course.

The 3 Vs of Big Data

In this lesson let's learn about the big three when it comes to big data. 

The Beam Pipeline

Apache Beam

Definition and History

In this lesson let's learn what Apache Beam is. 

Is an integral part of Cloud Dataflow so in this section we will learn all about it. 

Beam Object Model

In this lesson we learn about the various objects that make up a Beam Pipeline. 

Pipeline Object Review

Let's do a quick review of the critical objects in Beam. 

The answer key is in the lecture below. 

Object Review Answer Key

The answer key to the "Pipeline Object Review" lecture. 

Event Time and Processing Time

Let's learn the two core terms and concepts surround streaming data. 

Understanding how these two times related is the cornerstone to understanding streaming data sets. 


How do we slice up infinite out of order data sets? 

You use time windows. 

In this lesson let's learn how this happens. 

The Mobile App

Let's create a fictions use case so we can better understand streaming data sets. 

Handling Data Tensions

There are issues that arise from dealing with infinite unordered data sets. 

Let's learn what they are. 


It's the paper that started it all. 

Let's learn how MapReduce at a high level. 

FlumeJava and Batch Patterns
Event Skew

In this lesson let's learn what event skew is. 


The Dataflow Model

Cloud Dataflow: The SDK and the Runner

Apache Beam is an SDK for developing Pipeline. 

Cloud dataflow is a runner for executing those pipelines. 

Let's learn more about them in this brief lesson. 

The 4 Core Questions of Dataflow

We've seen this already once but let's review the questions once more. 

Lab: Building a Dataflow Pipeline

Let's build our own pipeline and then execute it on Cloud Dataflow.

Dataflow Job Monitoring UI

We need to be able to monitor our jobs. 

Stackdriver and Dataflow

We can easily monitor dataflow and most of our other services using Stackdriver. 

Simple Dashboard

In this lesson let's create a simple dashboard for monitoring dataflow 

Lab: Monitoring Dataflow

In this lesson let's learn how to monitor our dataflow jobs. 

You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.5 out of 5
21 Ratings

Detailed Rating

Stars 5
Stars 4
Stars 3
Stars 2
Stars 1
30-Day Money-Back Guarantee


1 hours on-demand video
8 articles
Full lifetime access
Access on mobile and TV
Certificate of Completion