About this course

Course type Premium
Course code VCLHWAHDJAV
Duration 4 Days
Special Notices

Please note: This course is delivered by accredited Hortonworks instructors. The syllabus includes specific use cases and examples to help illustrate and reinforce the theory and the functionality of the technology being explored. Where possible, the instructor will provide further examples and answer questions relevant to an individual delegates specific application of the technology. However, due to the complexity of the technology and the breadth of application across industries, this may not always be possible in the classroom environment.

This advanced course provides Java programmers a deep-dive into Hadoop application development.

Delegates will learn how to design and develop efficient and effective MapReduce applications for Hadoop using the Hortonworks Data Platform, including how to implement combiners, partitioners, secondary sorts, custom input and output formats, joining large datasets, unit testing, and developing UDFs for Pig and Hive. Labs are run on a 7-node HDP 2.1 cluster running in a virtual machine that students can keep for use after the training.

Target Audience:

Experienced Java software engineers who need to develop Java MapReduce applications for Hadoop.

Prerequisites

Please note: Hortonworks courses are delivered using electronic courseware. for delegates attending remotely (Virtual classes or Attend from Anywhere) you must ensure that you have dual monitors or a single monitor plus tablet device. Dual monitors are required in order to allow you to view labs and lab instructions on separate screens.

Technical pre-requisites

  • Delegates must have experience developing Java applications and using a Java IDE. Labs are completed using the Eclipse IDE and Gradle.

  • No prior Hadoop knowledge is required.

    Outline

    • Describe Hadoop 2 and the Hadoop Distributed File System
    • Describe the YARN framework
    • Develop and run a Java MapReduce application on YARN
    • Use combiners and in-map aggregation
    • Write a custom partitioner to avoid data skew on reducers
    • Perform a secondary sort
    • Recognize use cases for built-in input and output formats
    • Write a custom MapReduce input and output format
    • Optimize a MapReduce job
    • Configure MapReduce to optimize mappers and reducers
    • Develop a custom RawComparator class
    • Distribute files as LocalResources
    • Describe and perform join techniques in Hadoop
    • Perform unit tests using the UnitMR API
    • Describe the basic architecture of HBase
    • Write an HBase MapReduce application
    • List use cases for Pig and Hive
    • Write a simple Pig script to explore and transform big data
    • Write a Pig UDF (User-Defined Function) in Java
    • Write a Hive UDF in Java
    • Use JobControl class to create a MapReduce workflow
    • Use Oozie to define and schedule workflows


    Hands-On Labs

    • Configuring a Hadoop Development Environment
    • Putting data into HDFS using Java
    • Write a distributed grep MapReduce application
    • Write an inverted index MapReduce application
    • Configure and use a combiner
    • Writing custom combiners and partitioners
    • Globally sort output using the TotalOrderPartitioner
    • Writing a MapReduce job to sort data using a composite key
    • Writing a custom InputFormat class
    • Writing a custom OutputFormat class
    • Compute a simple moving average of stock price data
    • Use data compression
    • Define a RawComparator
    • Perform a map-side join
    • Using a Bloom filter
    • Unit testing a MapReduce job
    • Importing data into HBase
    • Writing an HBase MapReduce job
    • Writing User-Defined Pig and Hive functions
    • Defining an Oozie workflow

    Premium Course

    4 Days

    Duration
    Delivery Method

    Delivery method

    Virtual learning

    Recreates a classroom experience online, enabling full interactions with the learning professional leading the course.

    Find dates and prices

    Sorry, we don't have any public dates scheduled for this course, but it can be run as a closed event for your company.
    Please contact us for details on alternative ways we can help you 0845 757 3888 or email us at info@qa.com.

    Trusted, awarded and accredited

    Fully accredited to ensure we provide the highest possible standards in learning

    All third party trademark rights acknowledged.