Let’s make it work for you 

Overview

This course serves as an appropriate entry point to learn Advanced Data Engineering with Databricks.

Below, we describe each of the four, four-hour modules included in this course.

Read more +

Prerequisites

Participants should have:

  • Experience using PySpark APIs for advanced data transformations.
  • Familiarity with implementing Python classes.
  • Experience using SQL in production environments such as data warehouses or data lakes.
  • Hands-on experience with Databricks notebooks and cluster configuration.
  • Understanding of Delta Lake table creation and manipulation with SQL.

These prerequisites can be met by completing the Data Engineering with Databricks and Apache Spark Programming with Databricks courses and by earning the Databricks Certified Data Engineer Associate and Databricks Certified Associate Developer for Apache Spark certifications.

If you do not have one or more of the pre-requisites QA recommends: (can be taken in either order)

Target audience

This course is ideal for:

  • Data engineers aiming to optimise and scale data processing in Databricks.
  • Big data professionals working with streaming and batch data.
  • Cloud engineers focused on Delta Lake architectures and Structured Streaming.
  • Individuals preparing for the Databricks Certified Data Engineer Professional exam.
Read more +

Delegates will learn how to

Databricks Streaming and Delta Live Tables

This course provides a comprehensive understanding of Spark Structured Streaming and Delta Lake, including computation models, configuration for streaming read, and maintaining data quality in a streaming environment.

Databricks Data Privacy

This content is intended for the learner persona of data engineers or for customers, partners, and employees who complete data engineering tasks with Databricks. It aims to provide them with the necessary knowledge and skills to execute these activities effectively on the Databricks platform.

Databricks Performance Optimization

In this course, you’ll learn how to optimize workloads and physical layout with Spark and Delta Lake and and analyze the Spark UI to assess performance and debug applications. We’ll cover topics like streaming, liquid clustering, data skipping, caching, photons, and more.

Automated Deployment with Databricks Asset Bundles

This course provides a comprehensive review of DevOps principles and their application to Databricks projects. It begins with an overview of core DevOps, DataOps, continuous integration (CI), continuous deployment (CD), and testing, and explores how these principles can be applied to data engineering pipelines.

The course then focuses on continuous deployment within the CI/CD process, examining tools like the Databricks REST API, SDK, and CLI for project deployment. You will learn about Databricks Asset Bundles (DABs) and how they fit into the CI/CD process. You’ll dive into their key components, folder structure, and how they streamline deployment across various target environments in Databricks. You will also learn how to add variables, modify, validate, deploy, and execute Databricks Asset Bundles for multiple environments with different configurations using the Databricks CLI.

Finally, the course introduces Visual Studio Code as an Interactive Development Environment (IDE) for building, testing, and deploying Databricks Asset Bundles locally, optimizing your development process. The course concludes with an introduction to automating deployment pipelines using GitHub Actions to enhance the CI/CD workflow with Databricks Asset Bundles.

By the end of this course, you will be equipped to automate Databricks project deployments with Databricks Asset Bundles, improving efficiency through DevOps practices.

Read more +

Outline

Databricks Streaming and Delta Live Tables

  • Streaming Data Concepts
  • Introduction to Structured Streaming
  • Demo: Reading from a Streaming Query
  • Streaming from Delta Lake
  • Lab: Streaming Query Lab
  • Aggregation, Time Windows, Watermarks
  • Event Time + Aggregatios over Time Windows
  • Lab: Stream Aggregation Lab
  • Demo: Windowed Aggregation with Watermark
  • Data Ingestion Pattern
  • Demo: Auto Load to Bronze
  • Demo: Stream from Multiplex Bronze
  • Quality Enforcement Pattern
  • Demo: Quality Enforcement
  • Lab: Streaming ETL Lab

Databricks Data Privacy

  • Regulatory Compliance
  • Data Privacy
  • Key Concepts and Components
  • Audit Your Data
  • Data Isolation
  • Demo: Securing Data in Unity Catalog
  • Pseudonymization & Anonymization
  • Summary & Best Practices
  • Demo: PII Data Security
  • Capturing Changed Data
  • Deleting Data in Databricks
  • Demo: Processing Records from CDF and Propagating Changes
  • Lab: Propagating Changes with CDF Lab

Databricks Performance Optimization

  • DevOps Spark UI Introduction
  • Introduction to Designing Foundation
  • Demo: File Explosion
  • Data Skipping and Liquid Clustering
  • Lab: Data Skipping and Liquid Clustering
  • Skew
  • Shuffles
  • Demo: Shuffle
  • Spill
  • Lab: Exploding Join
  • Serialization
  • Demo: User-Defined Functions
  • Fine-Tuning: Choosing the Right Cluster
  • Pick the Best Instance Types

Automated Deployment with Databricks Asset Bundles

  • DevOps Review
  • Continuous Integration and Continuous Deployment/Delivery (CI/CD) Review
  • Demo: Course Setup and Authentication
  • Deploying Databricks Projects
  • Introduction to Databricks Asset Bundles (DABs)
  • Demo: Deploying a Simple DAB
  • Lab: Deploying a Simple DAB
  • Variable Substitutions in DABs
  • Demo: Deploying a DAB to Multiple Environments
  • Lab: Deploy a DAB to Multiple Environments
  • DAB Project Templates Overview
  • Lab: Use a Databricks Default DAB Template
  • CI/CD Project Overview with DABs
  • Demo: Continuous Integration and Continuous Deployment with DABs
  • Lab: Adding ML to Engineering Workflows with DABs
  • Developing Locally with Visual Studio Code (VSCode)
  • Demo: Using VSCode with Databricks
  • CI/CD Best Practices for Data Engineering
  • Next Steps: Automated Deployment with GitHub Actions
Read more +

Databricks training partner

Maximize your data and AI potential with Databricks certified training. Bridge skills gaps across your organization to accelerate data-driven innovation, by enabling teams to scale insights and deploy AI for business growth.

Yellow
Need to know

Frequently asked questions

How can I create an account on myQA.com?

There are a number of ways to create an account. If you are a self-funder, simply select the "Create account" option on the login page.

If you have been booked onto a course by your company, you will receive a confirmation email. From this email, select "Sign into myQA" and you will be taken to the "Create account" page. Complete all of the details and select "Create account".

If you have the booking number you can also go here and select the "I have a booking number" option. Enter the booking reference and your surname. If the details match, you will be taken to the "Create account" page from where you can enter your details and confirm your account.

Find more answers to frequently asked questions in our FAQs: Bookings & Cancellations page.

How do QA’s virtual classroom courses work?

Our virtual classroom courses allow you to access award-winning classroom training, without leaving your home or office. Our learning professionals are specially trained on how to interact with remote attendees and our remote labs ensure all participants can take part in hands-on exercises wherever they are.

We use the WebEx video conferencing platform by Cisco. Before you book, check that you meet the WebEx system requirements and run a test meeting to ensure the software is compatible with your firewall settings. If it doesn’t work, try adjusting your settings or contact your IT department about permitting the website.

How do QA’s online courses work?

QA online courses, also commonly known as distance learning courses or elearning courses, take the form of interactive software designed for individual learning, but you will also have access to full support from our subject-matter experts for the duration of your course. When you book a QA online learning course you will receive immediate access to it through our e-learning platform and you can start to learn straight away, from any compatible device. Access to the online learning platform is valid for one year from the booking date.

All courses are built around case studies and presented in an engaging format, which includes storytelling elements, video, audio and humour. Every case study is supported by sample documents and a collection of Knowledge Nuggets that provide more in-depth detail on the wider processes.

When will I receive my joining instructions?

Joining instructions for QA courses are sent two weeks prior to the course start date, or immediately if the booking is confirmed within this timeframe. For course bookings made via QA but delivered by a third-party supplier, joining instructions are sent to attendees prior to the training course, but timescales vary depending on each supplier’s terms. Read more FAQs.

When will I receive my certificate?

Certificates of Achievement are issued at the end the course, either as a hard copy or via email. Read more here.

Let's talk

A member of the team will contact you within 4 working hours after submitting the form.

By submitting this form, you agree to QA processing your data in accordance with our Privacy Policy and Terms & Conditions. You can unsubscribe at any time by clicking the link in our emails or contacting us directly.