Overview
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.
Outline
Introduction to Building Batch Data Pipelines
This module reviews different methods of data loading: EL, ELT and ETL and when to use what
- Module introduction
- EL, ELT, ETL
- Quality considerations
- How to carry out operations in BigQuery
- Shortcomings
- ETL to solve data quality issues
- QUIZ
- Introduction to Building Batch Data Pipelines
Executing Spark on Dataproc
This module shows how to run Hadoop on Dataproc, how to leverage Cloud Storage, and how to optimize your Dataproc jobs.
- Module introduction
- The Hadoop ecosystem
- Running Hadoop on Dataproc
- Cloud Storage instead of HDFS
- Optimizing Dataproc
- Optimizing Dataproc storage
- Optimizing Dataproc templates and autoscaling
- Optimizing Dataproc monitoring
- Lab Intro: Running Apache Spark jobs on Dataproc
- LAB: Running Apache Spark jobs on Cloud Dataproc: This lab focuses on running Apache Spark jobs on Cloud Dataproc.
- Summary
- QUIZ
Serverless Data Processing with Dataflow
This module covers using Dataflow to build your data processing pipelines
- Module introduction
- Introduction to Dataflow
- Why customers value Dataflow
- Building Dataflow pipelines in code
- Key considerations with designing pipelines
- Transforming data with PTransforms
- Lab Intro: Building a Simple Dataflow Pipeline
- LAB: A Simple Dataflow Pipeline (Python) 2.5: In this lab, you learn how to write a simple Dataflow pipeline and run it both locally and on the cloud.
- LAB: Serverless Data Analysis with Dataflow: A Simple Dataflow Pipeline (Java): In this lab you will open a Dataflow project, use pipeline filtering, and execute the pipeline locally and on the cloud using Java.
- Aggregate with GroupByKey and Combine
- Lab Intro: MapReduce in Beam
- LAB: MapReduce in Beam (Python) 2.5: In this lab, you learn how to use pipeline options and carry out Map and Reduce operations in Dataflow.
- LAB: Serverless Data Analysis with Beam: MapReduce in Beam (Java): In this lab you will identify Map and Reduce operations, execute the pipeline, use command line parameters.
- Side inputs and windows of data
- Lab Intro: Practicing Pipeline Side Inputs
- LAB: Serverless Data Analysis with Dataflow: Side Inputs (Python): In this lab you will try out a BigQuery query, explore the pipeline code, and execute the pipeline using Python.
- LAB: Serverless Data Analysis with Dataflow: Side Inputs (Java): In this lab you will try out a BigQuery query, explore the pipeline code, and execute the pipeline using Java.
- Creating and re-using pipeline templates
- Summary
- QUIZ
Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
This module shows how to manage data pipelines with Cloud Data Fusion and Cloud Composer.
- Module introduction
- Introduction to Cloud Data Fusion
- Components of Cloud Data Fusion
- Cloud Data Fusion UI
- Build a pipeline
- Explore data using wrangler
- Lab Intro: Building and executing a pipeline graph in Cloud Data Fusion
- LAB: Building and Executing a Pipeline Graph with Data Fusion 2.5: This tutorial shows you how to use the Wrangler and Data Pipeline features in Cloud Data Fusion to clean, transform, and process taxi trip data for further analysis.
- Orchestrate work between Google Cloud services with Cloud Composer
- Apache Airflow environment
- DAGs and Operators
- Workflow scheduling
- Monitoring and Logging
- Lab Intro: An Introduction to Cloud Composer
- LAB: An Introduction to Cloud Composer 2.5: In this lab, you create a Cloud Composer environment using the GCP Console. You then use the Airflow web interface to run a workflow that verifies a data file, creates and runs an Apache Hadoop wordcount job on a Dataproc cluster, and deletes the cluster.
- QUIZ
Frequently asked questions
How can I create an account on myQA.com?
There are a number of ways to create an account. If you are a self-funder, simply select the "Create account" option on the login page.
If you have been booked onto a course by your company, you will receive a confirmation email. From this email, select "Sign into myQA" and you will be taken to the "Create account" page. Complete all of the details and select "Create account".
If you have the booking number you can also go here and select the "I have a booking number" option. Enter the booking reference and your surname. If the details match, you will be taken to the "Create account" page from where you can enter your details and confirm your account.
Find more answers to frequently asked questions in our FAQs: Bookings & Cancellations page.
How do QA’s virtual classroom courses work?
Our virtual classroom courses allow you to access award-winning classroom training, without leaving your home or office. Our learning professionals are specially trained on how to interact with remote attendees and our remote labs ensure all participants can take part in hands-on exercises wherever they are.
We use the WebEx video conferencing platform by Cisco. Before you book, check that you meet the WebEx system requirements and run a test meeting to ensure the software is compatible with your firewall settings. If it doesn’t work, try adjusting your settings or contact your IT department about permitting the website.
How do QA’s online courses work?
QA online courses, also commonly known as distance learning courses or elearning courses, take the form of interactive software designed for individual learning, but you will also have access to full support from our subject-matter experts for the duration of your course. When you book a QA online learning course you will receive immediate access to it through our e-learning platform and you can start to learn straight away, from any compatible device. Access to the online learning platform is valid for one year from the booking date.
All courses are built around case studies and presented in an engaging format, which includes storytelling elements, video, audio and humour. Every case study is supported by sample documents and a collection of Knowledge Nuggets that provide more in-depth detail on the wider processes.
When will I receive my joining instructions?
Joining instructions for QA courses are sent two weeks prior to the course start date, or immediately if the booking is confirmed within this timeframe. For course bookings made via QA but delivered by a third-party supplier, joining instructions are sent to attendees prior to the training course, but timescales vary depending on each supplier’s terms. Read more FAQs.
When will I receive my certificate?
Certificates of Achievement are issued at the end the course, either as a hard copy or via email. Read more here.