An Introduction to Google Cloud Platform for Data Engineers
“I want to say thank you for all your courses. They are awesome and you are a great teacher. I passed my GCP Data Engineer exam yesterday. Your courses had a big part in that. Thanks again!” — Steve D.
“I took a few of your courses and you are an amazing teacher. Your courses have brought me up to speed on how to create databases and how to interact and handle Data Engineers and Data Scientists. I will be forever grateful.” -Tony
An Introduction to Google Cloud Platform for Data Engineers
Welcome to An Introduction to Google Cloud Platform for Data Engineers. This is the first course in a series of courses designed to help you attain the coveted Google Certified Data Engineer.
At this juncture the Google Certified Data Engineer is the only real world certification for data and machine learning engineers.
The Harvard Review dubbed the data scientist the sexiest job of the 21st century. It might be the greatest job in the world if you have a dedicated team of data engineers at your beckon call otherwise you’re first week of work is going to be a tough one.
Most applied machine learning is supervised. That simply means you use a data set to build your machine learning or deep learning models with.
In the real world data is messy, complicated and very difficult to work with. Another real world issues is working with data at scale. Very few companies have the compute or storage resources to work with large data sets. That’s where the Google Cloud Platform comes in. The tools they’ve built for the cloud are the tools they are using internally.
This is the first course in a series of course dedicated to learning the Google Cloud Platform and attaining the Google Certified Data Engineer.
This course will lay the foundation for what you’ll need in order to become a data engineer and pass the exam. In the course we will cover all the basics of the platform and be prepared to move into the more advanced data engineering specific courses.
Please keep in mind this course alone will not give you the knowledge and skills to pass the exam. The course will lay the foundation for what you’ll need to know before you begin a more intensive study of the services specific to data on the Google Cloud Platform.
**Five Reasons to Take this Course**
1) Google is the World Leader in Machine Learning
Google knows data and they know big data. They work with some of the largest data sets in the world. They also have one of the largest data science teams on the planet. Companies that know data science know data engineering. Google has been on the forefront of data exploration for years.
2) Occam’s Razor Approach to Teaching
Less is almost always more. If you’re serious about data engineering as a career you don’t need or want your hand held for long periods of time. You want the core of any subject and then you want to get your hands dirty. My courses are relatively short and to the point. You don’t have time for filler and I don’t believe in adding it.
3) Real World Instructor Experience
I’ve been working with data for over two decades. I’ve authored books, created applications being used in the real world and have over 30 course on Udemy specific to data. I’ve worked with over 50 different companies as an employee or consultant. Data has been my life for two decades and all of that has been in the real world.
4) The Only Data Engineering Certification on the Market
I have to be honest with you. I’m not a huge fan of most mastorial programs for machine learning or data related degrees. There is still quite a gap between what colleges teach and what companies expect. I believe certifications offer a better path into the real world of data and machine learning. A master’s degree might get you the interview but hands on with the product and the certification will get you the job.
5) Limited Data Engineering Courses
At the time I authored this course there are no other courses on the market that are dedicated to learning the skills you’ll need for passing the Google certified data engineer exam. My courses aren’t specific to learning how to pass the exam. I want you to be able to pass the exam but more importantly I want you to be able to apply what you’ve learned in my courses in order to work with various real world data sets.
What's the course about?
This is the course introduction.
Let's decide if this course is right for you.
Is your goal to become a data engineer or work on Google's Cloud Platform?
This lesson is a Q&A with yours truly about some of the specifics of the course and the Google Certified Data Engineer Certification.
The resource hierarchy is a high level look at how Google's has laid out their cloud from a project perspective.
Google's certification model is different from other vendors.
Google uses the concept of the case study.
In this lesson let's take a look at the case study example that Google uses.
In this lesson let's discuss the core resources on GCP.
In this lesson let's learn about the core resources you and I'll will need to know as data engineers.
Setting Up Shop
Before we dive into GCP (Google Cloud Platform) we need an account.
Let's learn how to set one up in this lesson.
You need two things to create an account: Gmail account and a credit card.
We will be spending quite a bit of time in the console.
In this lesson let's learn what the GCP console is and the basics of navigating it.
Let's learn what cloud shell is and how we can use it effectively.
The project is the central object on GCP.
Let's learn about it in this lesson.
Billing isn't exciting but it's a must know.
Let's walk through the billing screen and learn how to set up quotas.
APIs are services that Google offers.
The core ones are turned on by default but many others are not.
Let's take a look at the core services and learn how to turn on those that aren't by default.
Let's take a look at how projects are secured.
We have two core items to learn about and those are permission on people and on services.
This is the command line tool that will our use to do many administrative tasks on GCP.
In this lesson let's learn about the compute engine.
It's one of the core resources on GCP.
Google Compute Engine lets you create and run virtual machines on Google infrastructure.
In this lecture let's spin up a compute engine.
Do keep in mind that this is all within the confines of a project.
Cloud launcher is the easiest way to deploy packaged software solutions to GCP.
In this lesson let's deploy a solution in under three minutes.
We have tons of options when working with Compute Engine.
Let's walk through them at a high level in this lesson.
Using containers, everything required to make a piece of software run is packaged into isolated containers.
Unlike VMs, containers do not bundle a full operating system - only libraries and settings required to make the software work are needed.
In this lesson let's define what docker is in English.
In this lesson let's deploy a docker solution to GCP.
We will use Kubernetes for this lesson.
Google App Engine is a fully managed platform that completely abstracts away infrastructure so you focus only on code.
In this lesson let's spin up a java application using Google's Interactive Tutorials.
Google Cloud Storage
Storage has a few terms we must be familiar with before we get started.
In this lesson let's define what they are.
In Google Cloud Storage, you create a bucket to store your data. A bucket has three properties that you specify when you create it: a globally unique name, a location where the bucket and its contents are stored, and a default storage class for objects added to the bucket.
In this lesson let's discuss what storage classes are.
A bucket is similar in concept to a folder.
You create a bucket in the cloud and put stuff in it.
In this lesson let's create buckets in GCP.
In this lesson let's move some files around and discuss how permission are granted to the objects in our buckets.
Download the gsutil cheat sheet here.
In this lesson let's learn about GCP's only supported relational database.
There is a new one on the horizon but it isn't generally available right now.
An instance is an install of MySQL on GCP.
In this lesson let's spin up a new MySQL cloud instance.
In this lesson let's create a .sql file and learn how to create database and insert some data.
Using the cloud shell to work with MySQL.
In this lesson let's set up an automated backup of our instance using the binary log option.
The Data Section
Google Cloud Pub/Sub delivers low-latency, durable messaging.
While Pub/Sub isn't the easiest service to understand and use in this lesson we will reduce it down to the core components.
In this lesson let's learn about Pub/Sub via a demo.
Google Cloud Datastore is a NoSQL document database built for automatic scaling, high performance, and ease of application development.
In this lesson let's define it.
In this lesson let's demo datastore.
You'll get a better feel of what it is when you complete the demo.
In this lesson let's define what BigQuery is.
It's easily one of their most popular services.
If you're coming from the relational world you'll love it.
In this lesson let's learn how to use BigQuery.
In this short lesson let's define at a high level what BigTable is.
In this lesson let's learn how to use Big Table.
In this brief lesson let's learn what Cloud Dataproc is.
In this lesson let's work through a dataproc demo.
We will spin up a dataproc cluster then submit a spark job to our cluster.
Python on Google Cloud Platform
In this lesson let's define what Cloud datalab is and how to initiate a session within GCP.
In this lesson let's create a datalab session and use that session to create a very simple Python example.
TensorFlow is not the easiest language to understated.
However, because it's a core part of Google's Machine Learning approach and it's on the exam we have to dig into it.
I'll have a full course on this subject but for now, let's find out what it is.