Overview
This course serves as an appropriate entry point to learn Apache Spark Programming with Databricks.
Below, we describe each of the four, four-hour modules included in this course.
Prerequisites
Participants should have:
- Familiarity with Python and fundamental programming concepts, including data types, lists, dictionaries, variables, functions, loops, conditional statements, exception handling, accessing classes, and using third-party libraries.
- Basic knowledge of SQL, including writing queries using SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, and JOIN.
If you do not have one or more of the pre-requisites QA recommends:
Target Audience
This course is designed for:
- Data engineers and data scientists looking to enhance their Spark programming skills.
- Developers who want to leverage Apache Spark and Delta Lake on Databricks.
- Professionals working with large-scale data processing and real-time analytics.
Delegates will learn how to
Introduction to Apache Spark
This course offers essential knowledge of Apache Spark, with a focus on its distributed architecture and practical applications for large-scale data processing. Participants will explore programming frameworks, learn the Spark DataFrame API, and develop skills for reading, writing, and transforming data using Python-based Spark workflows.
Developing Applications with Apache Spark
Master scalable data processing with Apache Spark in this hands-on course. Learn to build efficient ETL pipelines, perform advanced analytics, and optimize distributed data transformations using Spark’s DataFrame API. Explore grouping, aggregation, joins, set operations, and window functions. Work with complex data types like arrays, maps, and structs while applying best practices for performance optimization.
Stream Processing and Analysis with Apache Spark
Learn the essentials of stream processing and analysis with Apache Spark in this course. Gain a solid understanding of stream processing fundamentals and develop applications using the Spark Structured Streaming API. Explore advanced techniques such as stream aggregation and window analysis to process real-time data efficiently. This course equips you with the skills to create scalable and fault-tolerant streaming applications for dynamic data environments.
Monitoring and Optimizing Apache Spark Workloads on Databricks
This course explores the Lakehouse architecture and Medallion design for scalable data workflows, focusing on Unity Catalog for secure data governance, access control, and lineage tracking. The curriculum includes building reliable, ACID-compliant pipelines with Delta Lake. You'll examine Spark optimization techniques, such as partitioning, caching, and query tuning, and learn performance monitoring, troubleshooting, and best practices for efficient data engineering and analytics to address real-world challenges.
Outline
Introduction to Apache Spark
- Spark Runtime Architecture
- Exploring Apache Spark Architecture in Databbricks
- Introduction to Spark DataFrames and SQL
- Reading and Writing Data with DataFrames
- Distributed System Programming Fundamentals
- Basic ETL with the DataFrame API
- Flight Data ETL with the DataFrame API
- Analyzing Transaction Data with DataFrames
Developing Applications with Apache Spark
- DataFrame API Basics
- Demo: (Optional) Basic ETL with the DataFrame API
- Grouping and Aggregating Data
- Demo: Grouping and Aggregating Data
- Lab: Grouping and Aggregating E-Commerce Data
- Relational Operations
- Demo: Data Relational Operations in Apache Spark
- Working with Complex Data
- Demo: Working with Complex Data Types in Apache Spark
- Lab: Working with Complex Data Types in E-Commerce Data
Stream Processing and Analysis with Apache Spark
- Introduction to Stream Processing
- Spark Structured Streaming
- Demo: Introduction to Spark Structured Streaming
- Lab: Introduction to Spark Structured Streaming
- Advanced Stream Processing and Analysis
- Demo: Window Aggregation in Spark Structured Streaming
- Lab: Window Aggregation in Spark Structured Streaming
Monitoring and Optimizing Apache Spark Workloads on Databricks
- Apache Spark and Databricks
- Using Apache Spark with Delta Lake
- Demo: Introduction to Delta Lake
- Lab: Introduction to Delta Lake
- Optimizing Apache Spark
- Demo: Optimizing Apache Spark
- Lab: Optimizing Apache Spark

Frequently asked questions
How can I create an account on myQA.com?
There are a number of ways to create an account. If you are a self-funder, simply select the "Create account" option on the login page.
If you have been booked onto a course by your company, you will receive a confirmation email. From this email, select "Sign into myQA" and you will be taken to the "Create account" page. Complete all of the details and select "Create account".
If you have the booking number you can also go here and select the "I have a booking number" option. Enter the booking reference and your surname. If the details match, you will be taken to the "Create account" page from where you can enter your details and confirm your account.
Find more answers to frequently asked questions in our FAQs: Bookings & Cancellations page.
How do QA’s virtual classroom courses work?
Our virtual classroom courses allow you to access award-winning classroom training, without leaving your home or office. Our learning professionals are specially trained on how to interact with remote attendees and our remote labs ensure all participants can take part in hands-on exercises wherever they are.
We use the WebEx video conferencing platform by Cisco. Before you book, check that you meet the WebEx system requirements and run a test meeting to ensure the software is compatible with your firewall settings. If it doesn’t work, try adjusting your settings or contact your IT department about permitting the website.
How do QA’s online courses work?
QA online courses, also commonly known as distance learning courses or elearning courses, take the form of interactive software designed for individual learning, but you will also have access to full support from our subject-matter experts for the duration of your course. When you book a QA online learning course you will receive immediate access to it through our e-learning platform and you can start to learn straight away, from any compatible device. Access to the online learning platform is valid for one year from the booking date.
All courses are built around case studies and presented in an engaging format, which includes storytelling elements, video, audio and humour. Every case study is supported by sample documents and a collection of Knowledge Nuggets that provide more in-depth detail on the wider processes.
When will I receive my joining instructions?
Joining instructions for QA courses are sent two weeks prior to the course start date, or immediately if the booking is confirmed within this timeframe. For course bookings made via QA but delivered by a third-party supplier, joining instructions are sent to attendees prior to the training course, but timescales vary depending on each supplier’s terms. Read more FAQs.
When will I receive my certificate?
Certificates of Achievement are issued at the end the course, either as a hard copy or via email. Read more here.