# Applied Clustering Techniques

Course code TPCLUS94
Duration 2 Days

The course looks at the theoretical and practical implications of a wide array of clustering techniques currently available in SAS. The techniques considered include cluster pre-processing, variable clustering, k-means clustering, and hierarchical clustering

# Prerequisites

Before attending this course, you should

• be able to execute SAS programs and create SAS data sets. You can gain this experience by completing the SAS Programming 1: Essentials course.
• have completed a graduate-level course in statistics or the Introduction to Statistics Using SAS 9.2: ANOVA, Linear Regression and Logistic Regression course.
• have an understanding of matrix algebra.

Who should attend

Intermediate or senior level statisticians, data analysts, and data miners

# Delegates will learn how to

• prepare and explore data for a cluster analysis
• distinguish among many different clustering techniques, making informed choices about which to use
• evaluate the results of a cluster analysis
• determine the appropriate number of clusters to retain
• profile and describe clustered observations
• score observations into clusters.

# Outline

Introduction to Clustering

• identifying types of clustering
• measuring similarity
• assessing multivariate normality
• using classification matrices

Preparation for Clustering

• preparing data for cluster analysis
• using variable clustering for variable selection
• using graphical clustering aids
• making elongated clusters more spherical
• viewing the impact of input standardization

Partitive Clustering

• k-means clustering for segmentation
• outlining the advantages of nonparametric clustering

Hierarchical Clustering

• comparing hierarchical clustering methods

Assessing Clustering Results

• determining the number of clusters
• profiling a cluster solution
• scoring new observations

Cluster Analysis Case Study

• variable selection
• graphical exploration of selected variables
• hierarchical clustering and determining the number of clusters
• profiling the seven-cluster solution
• modelling cluster membership
• scoring the customer database

Canonical Discriminant Analysis (CDA)Plots

• canonical discriminant plots

Fuzzy Clustering

• Q-methodology

Assessing Multivariate Normality

• assessing multivariate normality

# 2 Days

Duration

This is a QA approved partner course

# Trusted, awarded and accredited

Fully accredited to ensure we provide the highest possible standards in learning

All third party trademark rights acknowledged.