Overview

Get hands-on experience designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, and analyze data. This course covers structured, unstructured, and streaming data.

Products:

  • BigQuery
  • Bigtable
  • Cloud Storage
  • Cloud SQL
  • Spanner
  • Dataproc
  • Dataflow
  • Cloud Data Fusion
  • Cloud Composer
  • Pub/Sub
Read more +

Prerequisites

Participants should have:

  • Prior Google Cloud experience using Cloud Shell and accessing products from the Google Cloud console.
  • Basic proficiency with a common query language such as SQL.
  • Experience with data modeling and ETL (extract, transform, load) activities.
  • Experience developing applications using a common programming language such as Python

Target audience

This course is designed for:

  • Data engineers
  • Database administrators
  • System administrators
Read more +

Learning Outcomes

By the end of this course, learners will be able to:

  • Design and build data processing systems on Google Cloud.
  • Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.
  • Derive business insights from extremely large datasets using BigQuery.
  • Leverage unstructured data using Spark and ML APIs on Dataproc.
  • Enable instant insights from streaming data.
Read more +

Course Outline

Module 01: Data engineering tasks and components

  • The role of a data engineer
  • Data sources versus data syncs
  • Data formats
  • Storage solution options on Google Cloud
  • Metadata management options on Google Cloud
  • Share datasets using Analytics Hub

Module 02: Data replication and migration

  • Replication and migration architecture
  • The gcloud command line tool
  • Moving datasets
  • Datastream

Module 03: The extract and load data pipeline pattern

  • Extract and load architecture
  • The bq command line tool
  • BigQuery Data Transfer Service
  • BigLake

Module 04: The extract, load, and transform data pipeline pattern

  • Extract, load, and transform (ELT) architecture
  • SQL scripting and scheduling with BigQuery
  • Dataform

Module 05: The extract, transform, and load data pipeline pattern

  • Extract, transform, and load (ETL) architecture
  • Google Cloud GUI tools for ETL data pipelines
  • Batch data processing using Dataproc
  • Streaming data processing options
  • Bigtable and data pipelines

Module 06: Automation techniques

  • Automation patterns and options for pipelines
  • Cloud Scheduler and Workflows
  • Cloud Composer
  • Cloud Run functions
  • Eventarc

Module 07: Introduction to data engineering

  • Data engineer’s role
  • Data engineering challenges
  • Introduction to BigQuery
  • Data lakes and data warehouses
  • Transactional databases versus data warehouses
  • Effective partnership with other data teams
  • Management of data access and governance
  • Building of production-ready pipelines
  • Google Cloud customer case study

Module 08: Build a Data Lake

  • Introduction to data lakes
  • Data storage and ETL options on Google Cloud
  • Building of a data lake using Cloud Storage
  • Secure Cloud Storage
  • Store all sorts of data types
  • Cloud SQL as your OLTP system

Module 09: Build a data warehouse

  • The modern data warehouse
  • Introduction to BigQuery
  • Get started with BigQuery
  • Loading of data into BigQuery
  • Exploration of schemas
  • Schema design
  • Nested and repeated fields
  • Optimization with partitioning and clustering

Module 10: Introduction to building batch data pipelines

  • EL, ELT, ETL
  • Quality considerations
  • Ways of executing operations in BigQuery
  • Shortcomings
  • ETL to solve data quality issues

Module 11: Execute Spark on Dataproc

  • The Hadoop ecosystem
  • Run Hadoop on Dataproc
  • Cloud Storage instead of HDFS
  • Optimize Dataproc

Module 12: Serverless data processing with Dataflow

  • Introduction to Dataflow
  • Reasons why customers value Dataflow
  • Dataflow pipelines
  • Aggregating with GroupByKey and Combine
  • Side inputs and windows
  • Dataflow templates

Module 13: Manage data pipelines with Cloud Data Fusion and Cloud Composer

  • Build batch data pipelines visually with Cloud Data Fusion
  • Components
  • Overview
  • Building a pipeline
  • Exploring data using Wrangler
  • Orchestrate work between Google Cloud services with Cloud Composer
  • Apache Airflow environment
  • DAGs and operators
  • Workflow scheduling
  • Monitoring and logging

Module 14: Serverless messaging with Pub/Sub

  • Introduction to Pub/Sub
  • Pub/Sub push versus pull
  • Publishing with Pub/Sub code

Module 16: Dataflow streaming features

  • Streaming data challenges
  • Dataflow windowing

Module 17: High-throughput BigQuery and Bigtable streaming features

  • Streaming into BigQuery and visualizing results
  • High-throughput streaming with Bigtable
  • Optimizing Bigtable performance

Module 18: Advanced BigQuery functionality and performance

  • Analytic window functions
  • GIS functions
  • Performance considerations

Exams and assessments

There is no specific certification related to this course.

Hands-on learning

There are practical labs in this course.

Read more +

Why choose QA

Dates & Locations

Google Cloud learning paths

= Required
= Certification
Data Scientist
Data Engineer
Developer Software Development Experience
Architect Enterprise Architecture Experience
Administrator Windows Administrator Experience

Google Cloud Data learning paths

= Required
= Certification
Data Scientist
Data Engineer
Data Scientist Average salary: £62,000
Data Engineer Average salary: £70,000
Need to know

Frequently asked questions

How can I create an account on myQA.com?

There are a number of ways to create an account. If you are a self-funder, simply select the "Create account" option on the login page.

If you have been booked onto a course by your company, you will receive a confirmation email. From this email, select "Sign into myQA" and you will be taken to the "Create account" page. Complete all of the details and select "Create account".

If you have the booking number you can also go here and select the "I have a booking number" option. Enter the booking reference and your surname. If the details match, you will be taken to the "Create account" page from where you can enter your details and confirm your account.

Find more answers to frequently asked questions in our FAQs: Bookings & Cancellations page.

How do QA’s virtual classroom courses work?

Our virtual classroom courses allow you to access award-winning classroom training, without leaving your home or office. Our learning professionals are specially trained on how to interact with remote attendees and our remote labs ensure all participants can take part in hands-on exercises wherever they are.

We use the WebEx video conferencing platform by Cisco. Before you book, check that you meet the WebEx system requirements and run a test meeting to ensure the software is compatible with your firewall settings. If it doesn’t work, try adjusting your settings or contact your IT department about permitting the website.

How do QA’s online courses work?

QA online courses, also commonly known as distance learning courses or elearning courses, take the form of interactive software designed for individual learning, but you will also have access to full support from our subject-matter experts for the duration of your course.

Once you have purchased the Online course and have completed your registration, you will receive the necessary details to enable you to immediately access it through our e-learning platform and you can start to learn straight away, from any compatible device. Access to the online learning platform is valid for one year from the booking date.

All courses are built around case studies and presented in an engaging format, which includes storytelling elements, video, audio and humour. Every case study is supported by sample documents and a collection of Knowledge Nuggets that provide more in-depth detail on the wider processes.

When will I receive my joining instructions?

Joining instructions for QA courses are sent two weeks prior to the course start date, or immediately if the booking is confirmed within this timeframe. For course bookings made via QA but delivered by a third-party supplier, joining instructions are sent to attendees prior to the training course, but timescales vary depending on each supplier’s terms. Read more FAQs.

When will I receive my certificate?

Certificates of Achievement are issued at the end the course, either as a hard copy or via email. Read more here.

Let's talk

A member of the team will contact you within 4 working hours after submitting the form.

By submitting this form, you agree to QA processing your data in accordance with our Privacy Policy and Terms & Conditions. You can unsubscribe at any time by clicking the link in our emails or contacting us directly.