Overview
In this course, the student will learn how to implement and manage data engineering workloads on Microsoft Azure, using Azure services such as Azure Synapse Analytics, Azure Data Lake Storage Gen2, Azure Stream Analytics, Azure Databricks, and others.
The course focuses on common data engineering tasks such as orchestrating data transfer and transformation pipelines, working with data files in a data lake, creating and loading relational data warehouses, capturing and aggregating streams of real-time data, and tracking data assets and lineage.
Prerequisites
Successful students start this course with knowledge of cloud computing and core data concepts and professional experience with data solutions.
You can gain this knowledge through attending the following:
Azure Fundamentals (MAZ900)
Microsoft Azure Data Fundamentals (MDP900)
Attending these courses are not a mandatory prerequisite.
Learning Outcomes
- Explore compute and storage options for data engineering workloads in Azure.
- Run interactive queries using serverless SQL pools.
- Perform data Exploration and Transformation in Azure Databricks.
- Explore, transform, and load data into the Data Warehouse using Apache Spark.
- Ingest and load Data into the Data Warehouse.
- Transform Data with Azure Data Factory or Azure Synapse Pipelines.
- Integrate Data from Notebooks with Azure Data Factory or Azure Synapse Pipelines.
- Support Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link.
- Perform end-to-end security with Azure Synapse Analytics.
- Perform real-time Stream Processing with Stream Analytics.
- Create a Stream Processing Solution with Event Hubs and Azure Databricks.
Course Outline
Module 1: Introduction to data engineering on Azure
Learning objectives
- Identify common data engineering tasks
- Describe common data engineering concepts
- Identify Azure services for data engineering
Module 2: Introduction to Azure Data Lake Storage Gen2
Learning objectives
- Describe the key features and benefits of Azure Data Lake Storage Gen2
- Enable Azure Data Lake Storage Gen2 in an Azure Storage account
- Compare Azure Data Lake Storage Gen2 and Azure Blob storage
- Describe where Azure Data Lake Storage Gen2 fits in the stages of analytical processing
- Describe how Azure data Lake Storage Gen2 is used in common analytical workloads
Module 3: Introduction to Azure Synapse Analytics
Learning objectives
- Identify the business problems that Azure Synapse Analytics addresses.
- Describe core capabilities of Azure Synapse Analytics.
- Determine when to use Azure Synapse Analytics.
Module 4: Use Azure Synapse serverless SQL pool to query files in a data lake
Learning objectives
- Identify capabilities and use cases for serverless SQL pools in Azure Synapse Analytics
- Query CSV, JSON, and Parquet files using a serverless SQL pool
- Create external database objects in a serverless SQL pool
Module 5: Use Azure Synapse serverless SQL pools to transform data in a data lake
Learning objectives
- Use a CREATE EXTERNAL TABLE AS SELECT (CETAS) statement to transform data.
- Encapsulate a CETAS statement in a stored procedure.
- Include a data transformation stored procedure in a pipeline.
Module 6: Create a lake database in Azure Synapse Analytics
Learning objectives
- Understand lake database concepts and components
- Describe database templates in Azure Synapse Analytics
- Create a lake database
Module 7: Analyse data with Apache Spark in Azure Synapse Analytics
Learning objectives
- Identify core features and capabilities of Apache Spark.
- Configure a Spark pool in Azure Synapse Analytics.
- Run code to load, analyse, and visualise data in a Spark notebook.
Module 8: Transform data with Spark in Azure Synapse Analytics
Learning objectives
- Use Apache Spark to modify and save dataframes
- Partition data files for improved performance and scalability.
- Transform data with SQL
Module 9: Use Delta Lake in Azure Synapse Analytics
Learning objectives
- Describe core features and capabilities of Delta Lake.
- Create and use Delta Lake tables in a Synapse Analytics Spark pool.
- Create Spark catalog tables for Delta Lake data.
- Use Delta Lake tables for streaming data.
- Query Delta Lake tables from a Synapse Analytics SQL pool.
Module 10: Analyse data in a relational data warehouse
Learning objectives
- Design a schema for a relational data warehouse.
- Create fact, dimension, and staging tables.
- Use SQL to load data into data warehouse tables.
- Use SQL to query relational data warehouse tables.
Module 11: Load data into a relational data warehouse
Learning objectives
- Load staging tables in a data warehouse
- Load dimension tables in a data warehouse
- Load time dimensions in a data warehouse
- Load slowly-changing dimensions in a data warehouse
- Load fact tables in a data warehouse
- Perform post-load optimisations in a data warehouse
Module 12: Build a data pipeline in Azure Synapse Analytics
Learning objectives
- Describe core concepts for Azure Synapse Analytics pipelines.
- Create a pipeline in Azure Synapse Studio.
- Implement a data flow activity in a pipeline.
- Initiate and monitor pipeline runs.
Module 13: Use Spark Notebooks in an Azure Synapse Pipeline
Learning objectives
- Describe notebook and pipeline integration.
- Use a Synapse notebook activity in a pipeline.
- Use parameters with a notebook activity.
Module 14: Plan hybrid transactional and analytical processing using Azure Synapse Analytics
Learning objectives
- Describe Hybrid Transactional / Analytical Processing patterns.
- Identify Azure Synapse Link services for HTAP.
Module 15: Implement Azure Synapse Link with Azure Cosmos DB
Learning objectives
- Configure an Azure Cosmos DB Account to use Azure Synapse Link.
- Create an analytical store-enabled container.
- Create a linked service for Azure Cosmos DB.
- Analyse linked data using Spark.
- Analyse linked data using Synapse SQL.
Module 16: Implement Azure Synapse Link for SQL
Learning objectives
- Understand key concepts and capabilities of Azure Synapse Link for SQL.
- Configure Azure Synapse Link for Azure SQL Database.
- Configure Azure Synapse Link for Microsoft SQL Server.
Module 17: Get started with Azure Stream Analytics
Learning objectives
- Understand data streams.
- Understand event processing.
- Understand window functions.
- Get started with Azure Stream Analytics.
Module 18: Ingest streaming data using Azure Stream Analytics and Azure Synapse Analytics
Learning objectives
- Describe common stream ingestion scenarios for Azure Synapse Analytics.
- Configure inputs and outputs for an Azure Stream Analytics job.
- Define a query to ingest real-time data into Azure Synapse Analytics.
- Run a job to ingest real-time data, and consume that data in Azure Synapse Analytics.
Module 19: Visualise real-time data with Azure Stream Analytics and Power BI
Learning objectives
- Configure a Stream Analytics output for Power BI.
- Use a Stream Analytics query to write data to Power BI.
- Create a real-time data visualisation in Power BI.
Module 20: Introduction to Microsoft Purview
Learning objectives
- Evaluate whether Microsoft Purview is appropriate for data discovery and governance needs.
- Describe how the features of Microsoft Purview work to provide data discovery and governance.
Module 21: Integrate Microsoft Purview and Azure Synapse Analytics
Learning objectives
- Catalog Azure Synapse Analytics database assets in Microsoft Purview.
- Configure Microsoft Purview integration in Azure Synapse Analytics.
- Search the Microsoft Purview catalog from Synapse Studio.
- Track data lineage in Azure Synapse Analytics pipelines activities.
Module 22: Explore Azure Databricks
Learning objectives
- Provision an Azure Databricks workspace.
- Identify core workloads and personas for Azure Databricks.
- Describe key concepts of an Azure Databricks solution.
Module 23: Use Apache Spark in Azure Databricks
Learning objectives
- Describe key elements of the Apache Spark architecture.
- Create and configure a Spark cluster.
- Describe use cases for Spark.
- Use Spark to process and analyse data stored in files.
- Use Spark to visualise data.
Module 24: Run Azure Databricks Notebooks with Azure Data Factory
Learning objectives
- Describe how Azure Databricks notebooks can be run in a pipeline.
- Create an Azure Data Factory linked service for Azure Databricks.
- Use a Notebook activity in a pipeline.
- Pass parameters to a notebook.
Frequently asked questions
See all of our FAQsHow can I create an account on myQA.com?
There are a number of ways to create an account. If you are a self-funder, simply select the "Create account" option on the login page.
If you have been booked onto a course by your company, you will receive a confirmation email. From this email, select "Sign into myQA" and you will be taken to the "Create account" page. Complete all of the details and select "Create account".
If you have the booking number you can also go here and select the "I have a booking number" option. Enter the booking reference and your surname. If the details match, you will be taken to the "Create account" page from where you can enter your details and confirm your account.
Find more answers to frequently asked questions in our FAQs: Bookings & Cancellations page.
How do QA’s virtual classroom courses work?
Our virtual classroom courses allow you to access award-winning classroom training, without leaving your home or office. Our learning professionals are specially trained on how to interact with remote attendees and our remote labs ensure all participants can take part in hands-on exercises wherever they are.
We use the WebEx video conferencing platform by Cisco. Before you book, check that you meet the WebEx system requirements and run a test meeting (more details in the link below) to ensure the software is compatible with your firewall settings. If it doesn’t work, try adjusting your settings or contact your IT department about permitting the website.
Learn more about our Virtual Classrooms.
How do QA’s online courses work?
QA online courses, also commonly known as distance learning courses or elearning courses, take the form of interactive software designed for individual learning, but you will also have access to full support from our subject-matter experts for the duration of your course. When you book a QA online learning course you will receive immediate access to it through our e-learning platform and you can start to learn straight away, from any compatible device. Access to the online learning platform is valid for one year from the booking date.
All courses are built around case studies and presented in an engaging format, which includes storytelling elements, video, audio and humour. Every case study is supported by sample documents and a collection of Knowledge Nuggets that provide more in-depth detail on the wider processes.
Learn more about QA’s online courses.
When will I receive my joining instructions?
Joining instructions for QA courses are sent two weeks prior to the course start date, or immediately if the booking is confirmed within this timeframe. For course bookings made via QA but delivered by a third-party supplier, joining instructions are sent to attendees prior to the training course, but timescales vary depending on each supplier’s terms. Read more FAQs.
When will I receive my certificate?
Certificates of Achievement are issued at the end the course, either as a hard copy or via email. Read more here.