Pattern Discovery with
SAS
Class
meeting: 3/07/2012, Thursday
Introduction
to pattern discovery
Preview
1)
CCWEB
p3.1 to 358
2)
AAEM61
Section 81
Beforeclass review
Contents 
Duration 
Notes 
• Basic concepts (AAEM61 Section 8.1) • Distance measures between clusters •
Questions
for review: a. What is centroid? b. Does it matter if choosing different initial centriods to start clustering? 
1530 min 
View Slide #5 to #22. Reference: AAEM61 8.1 
• Cluster analysis n Kmeans method (AAEM61 Section 8.2) •
Questions
for review:

1530 min 
View Slide #24 to #43. AAEM61 8.2 
Demonstrations
& Exercises (70 minutes)
Demo# 
Contents 
Duration 
Notes 
11 
Exercise0: 1) Download the Excel file 2) Try it
out at least once. 
5 min 10 min 
The Excel
file is available at http://zlin.ba.ttu.edu/6347/Clustering.xls 1) 10 instances in the dataset 2) Two clusters are assumed 
12 
Exercise1: Given two observations: A = (4,2,01), and B = (3,4,1,0) , 1) Calculate their Euclidean distance between A and B, 2) Calculate their Cosine similarity 
5 min 
See Slide #12 to #18. 
13 
Exploring and Filtering Analysis Data Exercise2: 1) Define dataset CENSUS2000 2) Explore CENSUS2000 3) Using a
Filter node to clear up the dataset for clustering. 
6 min 14 min 
Dataset CENSUS2000 is in the AAEM61 library. 
14 
Exercise3: 1) Clustering with CENSUS2000 2) Cluster CENSUS with # of clusters = 10 3) Explore the clustering results 
8 min 22 min 
References: 1)
CCWEB p3.1 to p358 

Inclass
exercise deliverables: 1)
The results of
Exercise1 2)
One screenshot for Exercise2 3)
23 screenshot for
Exercise3. The
screenshots must include the information showing your user ID at the bottom
line of the SAS EM panel. This is to show students’ participation in the
class meeting. Email address: Subject: “ISQS6347 3/07/2013 <last name>” Due midnight
on 3/07, Thursday Find the Exercise 3 instructions in the network Drive,
which is due on April 2 