Let’s make it work for you
Overview
Data Engineering on AWS is a 3-day intermediate course, designed for professionals seeking a deep dive into data engineering practices and solutions on AWS. Through a balanced combination of theory, practical labs, and activities, participants learn to design, build, optimize, and secure data engineering solutions using AWS services. From foundational concepts to hands-on implementation of data lakes, data warehouses, and both batch and streaming data pipelines, this course equips data professionals with the skills needed to architect and manage modern data solutions at scale.
Prerequisites
Participants should have:
- Familiarity with basic machine learning concepts, such as supervised and unsupervised learning, regression, classification, and clustering algorithms.
- Working knowledge of Python programming language and common data science libraries like NumPy, Pandas, and Scikit-learn.
- Basic understanding of cloud computing concepts and familiarity with the AWS platform.
- Familiarity with SQL and relational databases is recommended but not mandatory.
- Experience with version control systems like Git is beneficial but not required.
Target Audience
This course is designed for professionals who are interested in designing, building, optimizing, and securing data engineering solutions using AWS services.
Learning Objectives
By the end of this course, learners will be able to:
- Understand the foundational roles and key concepts of data engineering, including data personas, data discovery, and relevant AWS services.
- Identify and explain the various AWS tools and services crucial for data engineering, encompassing orchestration, security, monitoring, CI/CD, IaC, networking, and cost optimization.
- Design and implement a data lake solution on AWS, including storage, data ingestion, transformation, and serving data for consumption.
- Optimize and secure a data lake solution by implementing open table formats, security measures, and troubleshooting common issues.
- Design and set up a data warehouse using Amazon Redshift Serverless, understanding its architecture, data ingestion, processing, and serving capabilities.
- Apply performance optimization techniques to data warehouses in Amazon Redshift, including monitoring, data optimization, query optimization, and orchestration.
- Manage security and access control for data warehouses in Amazon Redshift, understanding authentication, data security, auditing, and compliance.
- Design effective batch data pipelines using appropriate AWS services for processing and transforming data.
- Implement comprehensive strategies for batch data pipelines, covering data processing, transformation, integration, cataloging, and serving data for consumption.
- Optimize, orchestrate, and secure batch data pipelines, demonstrating advanced skills in data processing automation and security.
- Architect streaming data pipelines, understanding various use cases, ingestion, storage, processing, and analysis using AWS services.
- Optimize and secure streaming data solutions, including compliance considerations and access control.
Course Outline
Day 1
Module 1: Data Engineering Roles and Key Concepts
- Role of a Data Engineer
- Key functions of a Data Engineer
- Data Personas
- Data Discovery
- AWS Data Services
Module 2: AWS Data Engineering Tools and Services
- Orchestration and Automation
- Data Engineering Security
- Monitoring
- Continuous Integration and Continuous Delivery
- Infrastructure as Code
- AWS Serverless Application Model
- Networking Considerations
- Cost Optimization Tools
Module 3: Designing and Implementing Data Lakes
- Data lake introduction
- Data lake storage
- Ingest data into a data lake
- Catalog data
- Transform data
- Server data for consumption
- Hands-on lab: Setting up a Data Lake on AWS
Module 4: Optimizing and Securing a Data Lake Solution
- Open Table Formats
- Security using AWS Lake Formation
- Setting permissions with Lake Formation
- Security and governance
- Troubleshooting
- Hand-on lab: Automating Data Lake Creation using AWS Lake Formation Blueprints
Day 2
Module 5: Data Warehouse Architecture and Design Principles
- Introduction to data warehouses
- Amazon Redshift Overview
- Ingesting data into Redshift
- Processing data
- Serving data for consumption
- Hands-on Lab: Setting up a Data Warehouse using Amazon Redshift Serverless
Module 6: Performance Optimization Techniques for Data Warehouses
- Monitoring and optimization options
- Data optimization in Amazon Redshift
- Query optimization in Amazon Redshift
- Orchestration options
Module 7: Security and Access Control for Data Warehouses
- Authentication and access control in Amazon Redshift
- Data security in Amazon Redshift
- Auditing and compliance in Amazon Redshift
- Hands-on lab: Managing Access Control in Redshift
Module 8: Designing Batch Data Pipelines
- Introduction to batch data pipelines
- Designing a batch data pipeline
- AWS services for batch data processing
Module 9: Implementing Strategies for Batch Data Pipeline
- Elements of a batch data pipeline
- Processing and transforming data
- Integrating and cataloging your data
- Serving data for consumption
- Hands-on lab: A Day in the Life of a Data Engineer
Day 3
Module 10: Optimizing, Orchestrating, and Securing Batch Data Pipelines
- Optimizing the batch data pipeline
- Orchestrating the batch data pipeline
- Securing the batch data pipeline
- Hands-on lab: Orchestrating Data Processing in Spark using AWS Step Functions
Module 11: Streaming Data Architecture Patterns
- Introduction to streaming data pipelines
- Ingesting data from stream sources
- Streaming data ingestion services
- Storing streaming data
- Processing Streaming Data
- Analyzing Streaming Data with AWS Services
- Hands-on lab: Streaming Analytics with Amazon Managed Service for Apache Flink
Module 12: Optimizing and Securing Streaming Solutions
- Optimizing a streaming data solution
- Securing a streaming data pipeline
- Compliance considerations
- Hands-on lab: Access Control with Amazon Managed Streaming for Apache Kafka
Exams and Assessments
This course will help prepare for the Exam Prep: AWS Certified Data Engineer - Associate (DEA-C01) exam.
Hands-On Learning
This course includes presentations, demonstrations, hands-on labs, and group exercises.
Frequently asked questions
How can I create an account on myQA.com?
There are a number of ways to create an account. If you are a self-funder, simply select the "Create account" option on the login page.
If you have been booked onto a course by your company, you will receive a confirmation email. From this email, select "Sign into myQA" and you will be taken to the "Create account" page. Complete all of the details and select "Create account".
If you have the booking number you can also go here and select the "I have a booking number" option. Enter the booking reference and your surname. If the details match, you will be taken to the "Create account" page from where you can enter your details and confirm your account.
Find more answers to frequently asked questions in our FAQs: Bookings & Cancellations page.
How do QA’s virtual classroom courses work?
Our virtual classroom courses allow you to access award-winning classroom training, without leaving your home or office. Our learning professionals are specially trained on how to interact with remote attendees and our remote labs ensure all participants can take part in hands-on exercises wherever they are.
We use the WebEx video conferencing platform by Cisco. Before you book, check that you meet the WebEx system requirements and run a test meeting to ensure the software is compatible with your firewall settings. If it doesn’t work, try adjusting your settings or contact your IT department about permitting the website.
How do QA’s online courses work?
QA online courses, also commonly known as distance learning courses or elearning courses, take the form of interactive software designed for individual learning, but you will also have access to full support from our subject-matter experts for the duration of your course. When you book a QA online learning course you will receive immediate access to it through our e-learning platform and you can start to learn straight away, from any compatible device. Access to the online learning platform is valid for one year from the booking date.
All courses are built around case studies and presented in an engaging format, which includes storytelling elements, video, audio and humour. Every case study is supported by sample documents and a collection of Knowledge Nuggets that provide more in-depth detail on the wider processes.
When will I receive my joining instructions?
Joining instructions for QA courses are sent two weeks prior to the course start date, or immediately if the booking is confirmed within this timeframe. For course bookings made via QA but delivered by a third-party supplier, joining instructions are sent to attendees prior to the training course, but timescales vary depending on each supplier’s terms. Read more FAQs.
When will I receive my certificate?
Certificates of Achievement are issued at the end the course, either as a hard copy or via email. Read more here.