Pattern Discovery with SAS Enterprise Miner

Class meeting: 3/07/2012, Thursday

 

Introduction to pattern discovery

 

Preview Readings:

1)     CCWEB p3.1 to 3-58

2)     AAEM61 Section 8-1

3)     TSK chapter 8

 

Before-class review

Contents

Duration

Notes

  Basic concepts (AAEM61 Section 8.1)

  Distance measures between clusters

  Questions for review:

a. What is centroid?

b. Does it matter if choosing different initial centriods to start clustering?

15-30 min

 

View Slide #5 to #22.

Reference: AAEM61 8.1

  Cluster analysis

n  K-means method (AAEM61 Section 8.2)

  Questions for review:

  1. Why do we use Filter node?
  2. How is the number of clusters determined?
  3. Why do we need to standardize the input variables before clustering?

 

15-30 min

View Slide #24 to #43.

AAEM61 8.2

 

Demonstrations & Exercises (70 minutes)

Demo#

Contents

Duration

Notes

1-1

Clustering with Excel

Exercise-0:

1) Download the Excel file

2) Try it out at least once.

5 min

10 min

The Excel file is available at http://zlin.ba.ttu.edu/6347/Clustering.xls

1) 10 instances in the dataset

2) Two clusters are assumed

1-2

Exercise-1:

Given two observations:

A = (4,2,01), and B = (3,4,1,0) ,

1)      Calculate their Euclidean distance between A and B,

2)      Calculate their Cosine similarity

5 min

See Slide #12 to #18.

 

1-3

Exploring and Filtering Analysis Data

Exercise-2:

1) Define dataset CENSUS2000

2) Explore CENSUS2000

3) Using a Filter node to clear up the dataset for clustering.

6 min

14 min

Dataset CENSUS2000 is in the AAEM61 library.

1-4

Creating Clusters

Exercise-3:

1) Clustering with CENSUS2000

2) Cluster CENSUS with # of clusters = 10

3) Explore the clustering results

8 min

22 min

References:

1)     CCWEB p3.1 to p3-58

2)      TSK chapter 8

 

 

In-class exercise deliverables:

1)      The results of Exercise-1

2)      One screenshot for Exercise-2

3)      2-3 screenshot for Exercise-3.

The screenshots must include the information showing your user ID at the bottom line of the SAS EM panel. This is to show students’ participation in the class meeting.

Email address:

Isqs6347@gmail.com

Subject:

“ISQS6347 3/07/2013 <last name>”

Due midnight on 3/07, Thursday

 

Find the Exercise 3 instructions in the network Drive, which is due on April 2