Hands-On Amazon Redshift for Data Warehousing
Amazon Redshift is a low-cost cloud data platform that can scale from gigabytes to petabytes on a high-performance, column-oriented SQL engine. Amazon Redshift brings the power of scale-out architecture to the world of traditional data warehousing.
In this course, you will explore this low-cost, cloud-based storage, which can be scaled up or down to meet your true size and performance needs. You will learn to configure a sample data warehouse. Next, you will explore Redshift’s internal workings and architecture, and learn what makes it so fast. You will get hands-on experience connecting, querying, and building BI and data viz products and learn how to secure, maintain, and administer your new platform.
By the end of this course, you will be able to scale from gigabytes to petabytes on this high-performance, column-oriented SQL engine.
About The Author
Colibri Digital is a technology consultancy company founded in 2015 by James and Ingrid Cross. The company works to help its clients navigate the rapidly changing and complex world of emerging technologies, with deep expertise in areas like big data, data science, machine learning, and cloud computing.
Over the past few years they have worked with some of the world’s largest and most prestigious companies, including tier 1 investment banks, a leading management consultancy group, and one of the world’s most popular soft drinks companies, helping each of them to better make sense of its data, and process it in more intelligent ways.
At the frontier of AI, big data and cloud computing, we are Colibri Digital.
James Cross is a big data engineer and certified AWS Solutions architect with a passion for data-driven applications. He’s spent the last 3-5 years helping his clients to design and implement huge-scale streaming big data platforms, Cloud-based analytics stacks, and serverless architectures.
He started his professional career in Investment Banking, working with well-established technologies such as Java and SQL Server, before moving into the big data space. Since then, he’s worked with a huge range of big data tools including most of the Hadoop eco-system, Spark, and many No-SQL technologies such as Cassandra, MongoDB, Redis, and DynamoDB. More recently, his focus has been on Cloud technologies and how they can be applied to data analytics, culminating in his work at Scout Solutions as a CTO, and more recently with Mckinsey.
James is an AWS-certified solutions architect with several years’ experience designing and implementing solutions on this cloud platform. As the CTO of Scout Solutions Ltd, he built a fully serverless set of API’s and an analytics stack based around Lambda and Redshift.
Data Warehousing for the Internet Age
This video gives a glimpse of the entire course.
Understanding the use cases for data warehousing in a modern data landscape.
• Understand the data landscape today
• Understand the case for BI
• Have a look at the BI Use cases
Understanding the modern data landscape and the technologies that make it up.
• Understand NoSQL
• Understand big data
• Understand RDBMS
Understanding the BI use case in detail and how to solve that problem on large datasets.
• Have a look at the BI use case
• Scale up the BI tools
Introducing cloud native BI data warehouse tools like Redshift.
• Go through the Cloud Native Data Tools
• Introduction to Redshift
• Cloud native data warehousing
Getting Started with Redshift
Introduction to the AWS console and how to use it to launch a redshift cluster.
• Understand the AWS Console
• Launch a Redshift cluster
• Connect to Redshift
Introduction to Cloudformation, and how to use it to launch a Redshift cluster.
• Get introduced to Cloudformation
• Launch a Cluster with Cloudformation
• Terminate a cluster with Cloudformation
Understanding why technologies like columnar file systems enable the scale out of data warehouses.
• Differentiate between Columnar and Tabular
• Understand why Columnar can be used to accelerate BI queries
Understanding why technologies like MPP enable the scale out of data warehouses.
• Get yourself introduced to the MPP concept
• Scale out queries using MPP
Creating a Redshift Data Warehouse from Disparate Data Sets
What we need in a source data set and what we're trying to achieve with it?
• Learn what we need in a data set
• Learn what we will do with the data set
• Download the IMDB data set
Understanding how to load data of various sizes into our new DWH cluster.
• Upload data t S3
• Copy data to Redshift
• Query data
Understanding how to connect to Redshift and run basic join queries against it.
• Connect to Redshift
• Execute queries
Understanding why technologies like Query Caching enable the scale out of data warehouses.
• Get yourself introduced to the Query caching concept
• Query caching at scale with Redshift
Optimizing Redshift for Scale
Learn how to use manifest files to copy vast amounts of data efficiently into Redshift.
• Learn how does Redshift run a copy command
• Understand manifest files
• Leverage manifests to copy data
Understand the different data types Redshift can use and how they impact query performance.
• Explore the Compression types
• Learn what to use and when
Learn how to make the most of the MPP concept by avoiding data skew.
• Learn about MPP and data skew
• Ensure even distribution with distribution keys
Connecting Redshift with disconnected Data using Redshift Spectrum
The use case for tools like Spectrum.
• Understand the issues with BI tools
• Use cases for Spectrum
How to load a disconnected data set into Redshift?
• Load data to S3
How to create an external database and schema for data sets on S3?
• Create an external DB
• Create an external schema and table
Visualizing your results with Amazon Quicksight
Reviewing the use case for BI.
• Understand Business Intelligence
• Get intelligence from data
What is Quicksight and where does it fit in?
• Explore the Visualization tool options
• Get introduced to Quicksight
The typical problem with BI tools and how to solve it.
• Understand the problem with low BI tools
• Accelerate BI queries
• SPICE introduction
How to load data into Spice and visualize it?
• Visualize data on Redshift
• Compare performance to data loaded into Spice