Practical Data Engineering in GCP: Beginner to Advanced

Practical Data Engineering in GCP: Beginner to Advanced
19.99 GBP
Buy Now

In this course, we will be creating a data lake using Google Cloud Storage and bring data warehouse capabilites to the data lake to form the lakehouse architecture using Google BigQuery. We will be building four no code data pipelines using services such as DataStream, Dataflow, DataPrep, Pub/Sub, Data Fusion, Cloud Storage, BigQuery etc. The course will follow a logical progression of a real world project implementation with hands on experience of setting up a data lake, creating data pipelines for ingestion and transforming your data in preparation for analytics and reporting. Chapter 1 We will setup a project in Google CloudIntroduction to Google Cloud StorageIntroduction to Google BigQueryChapter 2 - Data Pipeline 1We will create a cloud SQL database and populate with data before we start performing complex ETL jobs. Use DataStream Change Data Capture for streaming data from our Cloud SQL Database into our Data lake built with Cloud StorageAdd a pub/sub notification to our bucketCreate a Dataflow Pipeline for streaming jobs into BigQueryChapter 3 - Data Pipeline 2Introduce Google Data FusionAuthor and monitor ETL jobs for tranforming our data and moving them between different zone of our data lakeWe will explore the use of Wrangler in Data Fusion for profiling and understanding our data before we starting performing complex ETL jobs. Clean and normalise dataDiscover and govern data using metadata in Data FusionChapter 4 - Data Pipeline 3Introduction to Google Pub/SubBuilding a .Net application for publishing data to a Pub/Sub topicBuilding a realtime data pipeline for streaming messages to BigQueryChapter 5 - Data Pipeline 4Introduction to Cloud DataPrepProfile, Author and monitor ETL jobs for tranforming our data using DataPrep