About this course

Course code TPDIHPS
Duration 3 Days

In this course you will use processing methods to prepare structured and unstructured big data for analysis. You will learn to organize this data into structured tabular form using Apache Hive and Apache Pig. You will also learn SAS software technology and techniques that integrate with Hive and Pig and how to leverage these open source capabilities by programming with BASE SAS and SAS Access Interface to Hadoop, and with SAS Data Integration Studio.

Prerequisites

A basic understanding of and experience with UNIX and SQL is preferred. For advanced topics such as user-defined functions, prior programming experience is necessary.

Who should attend

Data scientists and programmers, database administrators, applications developers, and ETL developers who are looking for an in-depth technical overview of data management and extraction for big data and the Hadoop ecosystem

Delegates will learn how to

  • move data into the Hadoop ecosystem
  • use Hive to design a data warehouse in Hadoop
  • perform data analysis using Hive Query Language
  • join data sources
  • perform extract, load, and transformation
  • organize data in Hadoop by usage
  • perform analysis on unstructured data using Apache Pig
  • join massive data sets using Pig
  • use user-defined functions (UDFs)
  • analyze big data in Hadoop using Hive and Pig
  • use SAS programming to submit Hive and Pig programs that execute in Hadoop and store results in Hadoop or return results to SAS
  • use SAS programming to move data between the SAS server and the Hadoop Distributed File System (HDFS)
  • construct SAS Data Integration Studio jobs that integrate with Hive and Pig processes and the HDFS.

Outline

The Apache Hadoop Project

  • overview of big data ecosystem
  • Hadoop essentials

Hive and HiveQL

  • Apache Hive overview
  • data definition language
  • data manipulation language

Pig and Pig Latin

  • Apache Pig overview
  • Apache Pig programming
  • advanced Apache Pig programming
  • Pig programming recommendations

SAS and Hadoop

  • SAS technology for Hadoop overview
  • programming with base SAS and SAS/ACCESS
  • SAS Data Integration Studio
  • DS2 and the code accelerator for Hadoop
  • In-Memory analytics

3 Days

Duration

This is a QA approved partner course

Delivery Method

Delivery method

Classroom

Face-to-face learning in the comfort of our quality nationwide centres, with free refreshments and Wi-Fi.

Trusted, awarded and accredited

Fully accredited to ensure we provide the highest possible standards in learning

All third party trademark rights acknowledged.