About this course

Course code TPBDCT
Duration 2 Days

This course teaches you how to detect errors in raw data source files as well as showing you how to identify and correct errors in character and numeric SAS data. You will learn a variety of techniques for detecting problems with more complex data structures, such as data sets requiring multiple observations per subject, or requiring entries for a single subject across multiple data sets. More than simply teaching techniques for detecting and fixing data errors, this is also an excellent SAS programming course. Novice and veteran SAS programmers alike will garner new and valuable programming tips and tricks.

Knowledge of the SAS macro language is not a prerequisite, but you will learn to run macros that perform a variety of data cleaning functions. As an example, the AUTO_OUTLIERS macro automatically reports on outliers in numeric data using a concept called 'trimmed statistics.'

The class closes with a demonstration of an innovative process that leverages integrity constraints and audit trails to detect and programmatically clean dirty data before it even gets into your analysis data set. After class, you'll have access to every program and macro used during class, as well as a personal copy of Cody's Data Cleaning Techniques, Second Edition.

Prerequisites

Before attending this course, participants should have completed the SAS Programming 1: Essentials course. Completion of the SAS Programming 2: Data Manipulation Techniques course or a minimum of one year of SAS programming experience is also recommended.

This course addresses Base SAS, SAS/STAT software.

Who Should Attend?

SAS programmers, analysts, and researchers

Delegates will learn how to

  • check character and numeric variables for invalid values
  • identify and count missing values for numeric and character variables
  • work with and manipulate dates
  • identify and count missing values for numeric and character variables
  • compare two data sets with and without an ID variable
  • identify and count missing values for numeric and character variables.

Outline

Checking Values of Character Variables

  • introduction
  • using the FREQ procedure to list values
  • using the DATA step to check for invalid values
  • using the PRINT procedure to list invalid values
  • using formats to check for invalid values

Checking Values of Numeric Variables

  • looking for outliers
  • using a WHERE statement with the PRINT procedure
  • using a DATA step to check for invalid values
  • creating a macro for range checking
  • using formats and informats to check for invalid values
  • looking for highest and lowest by percentage
  • looking for highest and lowest n values
  • detecting outliers based on the standard deviation and the trimmed statistics
  • detecting outliers based on the interquartile range
  • checking ranges of several variables

Checking for Missing Values

  • identifying missing values
  • counting missing and nonmissing values

Working with Dates

  • checking dates
  • working with dates in nonstandard form

Looking for Duplicates and n Observations per Subject

  • detecting duplicates
  • identifying duplicates and verifying n observations per subject

Working with Multiple Files

  • checking for an ID in each of two files
  • checking for an ID in each of three files

Double Entry and Verification

  • conducting a simple comparison
  • double entry verification

Correcting Errors, Integrity Constraints, and Audit Trails

  • making simple corrections
  • introduction to integrity constraints and audit trails

Self Study

2 Days

Duration

This is a QA approved partner course

Delivery Method

Delivery method

Classroom

Face-to-face learning in the comfort of our quality nationwide centres, with free refreshments and Wi-Fi.

Find dates and prices

Online booking is currently not available for this course, to find out more please call us on 0345 074 7998 or email us at info@qa.com to discuss how we can help.

Trusted, awarded and accredited

Fully accredited to ensure we provide the highest possible standards in learning

All third party trademark rights acknowledged.