About this course

Course code TPDL23HD
Duration 2 Days

This course gives business analysts and data scientists a seamless platform to profile, integrate, cleanse, and move big data without writing code in a Hadoop environment using an intuitive web-based interface.

Prerequisites

  • There are currently no prerequisites for this course.

Who should attend

Business Users who interact with data, perform data discovery, query data, and ensure that data is in the proper place and format for other users; Data Analysts, Data Scientists, and Statisticians who review results of data discovery activities, create new tables, create new data elements, change the format/structure of data tables to view it in a variety of ways, manipulate and score data elements, and load data for use by other users; and Data Management Specialists who apply enterprise standa

Delegates will learn how to

  • move data in and out of Hadoop
  • interrogate and profile data for quality issues
  • transform, transpose, and join data that is fit-for-purpose
  • cleanse and integrate data suitable for analysis and reporting
  • load data into the SAS In-Memory Analytics Server for analytics and exploration
  • execute custom SAS and HiveQL code inside the Hadoop cluster.

Outline

Introduction

  • why SAS?
  • why SAS and Hadoop?
  • why SAS Data Loader for Hadoop?
  • introduction to the Big Data Era
  • why Hadoop?

SAS Data Loader Overview

  • introduction to virtual applications
  • introduction to SAS Data Loader (vApp)

SAS Data Loader functionality

  • navigating in SAS Data Loader interface
  • steps common to most directives

Methodology and Course Flow

  • SAS Data Loader use cases
  • preparing data for analytics methodology
  • course overview and demo/exercise logistics

Acquiring and Discovering Data

  • copy tables to Hadoop
  • import text files into Hadoop
  • profile data in Hadoop for data quality issues
  • query data in Hadoop to understand structure and content

Transforming and Transposing Data

  • transform data in Hadoop
  • transpose data in Hadoop

Cleansing Data

  • parse data into meaningful subsets to provide a basis for analysis
  • standardize data into consistent format and structure
  • generate match codes to support fuzzy matches for joining tables
  • identify and categorize data in Hadoop
  • filter data rows using business rules or Hive expressions

Integrating Data

  • create queries to select and join tables using inner, outer, left, and right join types
  • join tables using generated match codes for dissimilar table
  • sort, de-duplicate, and manage columns and data
  • execute SAS programs in Hadoop using ultra-efficient SAS DS2 language elements
  • run a Hive program using an expression builder or copy in your code

Delivering Data

  • load data to LASR
  • copy data from Hadoop

Additional Topics

  • SAS Data Loader vApp settings
  • SAS Data Loader configurations
  • SAS and Hadoop data processing
  • SAS DS2 programs
  • debugging Hadoop jobs
  • debugging Hadoop jobs

2 Days

Duration

This is a QA approved partner course

Delivery Method

Delivery method

Classroom

Face-to-face learning in the comfort of our quality nationwide centres, with free refreshments and Wi-Fi.

Find dates and prices

Online booking is currently not available for this course, to find out more please call us on 0345 074 7998 or email us at info@qa.com to discuss how we can help.

Trusted, awarded and accredited

Fully accredited to ensure we provide the highest possible standards in learning

All third party trademark rights acknowledged.