About this course

Course code TPBDMT12
Duration 3 Days

This course introduces a data mining methodology that is a superset to the SAS SEMMA methodology around which SAS Enterprise Miner is organized. The course also introduces a wide range of data mining algorithms and both theoretical knowledge and practical skills. In this class, you work through all the steps of a data mining project, beginning with problem definition and data selection, and continuing through data exploration, data transformation, sampling, portioning, modeling, and assessment.

Prerequisites

  • No prior knowledge of statistical or data mining tools is required.

Who should attend

  • Business analysts, their managers, and statisticians

Delegates will learn how to

  • use a data mining methodology
  • build and use decision trees and neural networks for modeling and scoring
  • use survival analysis and create survival curves.

Outline

Introduction to Data Mining

  • models
  • what is data mining?
  • profiling and prediction
  • directed and undirected data mining

Data Mining Methodology

  • translating business problems into data mining problems
  • why have a methodology?
  • how data miners can inadvertently learn things that are not true
  • finding the right input variables
  • the importance of model stability
  • data preparation
  • partitioning to create training, validation, and test sets
  • sampling to create balanced model sets
  • model assessment

Data Exploration

  • data structure
  • summary statistics
  • data values
  • developing intuition about data
  • histograms
  • data types
  • using SAS Enterprise Miner for data exploration
  • exploring distributions

Regression Models

  • confidence bounds
  • the null hypothesis
  • statistical significance
  • variance and standard deviation
  • linear regression
  • correlation
  • logistic regression
  • standardized values
  • using SAS Enterprise Miner to build regression models

Decision Trees

  • decision trees for modeling and scoring
  • decision trees for variable selection
  • alternate representations of decision trees
  • algorithms used to build decision trees
  • splitting criteria
  • recognizing instability and overfitting in decision tree models
  • capturing interactions between variables
  • using SAS Enterprise Miner to build decision trees
  • decision trees as data exploration and classification tools

Neural Networks

  • neural networks compared with regression
  • algorithms used to train neural networks
  • creating neural network models using SAS Enterprise Miner
  • origins of neural networks
  • picking appropriate inputs for neural networks
  • data preparation requirements for neural networks

Memory-Based Reasoning

  • similarity and distance
  • the role of the training set in memory-based reasoning (MBR)
  • combining the votes of several neighbors
  • other K-nearest neighbor techniques
  • using the SAS Enterprise Miner MBR node
  • distance metrics appropriate for different kinds of data
  • collaborative filtering

Clustering

  • divisive clustering
  • more on similarity and distance
  • the -means algorithm
  • finding clusters with SAS Enterprise Miner
  • data preparation for clustering
  • agglomerative clustering
  • interpreting clusters

Survival Analysis

  • origins of survival analysis
  • hazards and hazard charts
  • how business data is different from clinical data
  • calculating hazards empirically
  • calculating survival from retention
  • parametric hazard models
  • survival-based forecasting
  • competing risks
  • retention curves and survival curves
  • censoring
  • using SAS code in SAS Enterprise Miner to create survival curves

Association Rules

  • market basket analysis
  • sequential pattern analysis
  • using SAS Enterprise Miner to discover associations in retail data
  • association rules

Link Analysis

  • sphere of influence
  • background on graph theory
  • using link analysis to generate derived variables
  • Kleinberg's algorithm
  • graph-coloring algorithm

Genetic Algorithms

  • genetic algorithms
  • optimization techniques and problems (SAS/OR software)
  • other algorithms
  • linear programming problems

3 Days

Duration

This is a QA approved partner course

Delivery Method

Delivery method

Classroom

Face-to-face learning in the comfort of our quality nationwide centres, with free refreshments and Wi-Fi.

Find dates and prices

Online booking is currently not available for this course, to find out more please call us on 0345 074 7998 or email us at info@qa.com to discuss how we can help.

Trusted, awarded and accredited

Fully accredited to ensure we provide the highest possible standards in learning

All third party trademark rights acknowledged.