Pattern Discovery with SAS Enterprise Miner


Class meeting: 03/07/2013, Thursday

 

Go through the following course contents:

http://zlin.ba.ttu.edu/6347/PatternRecognition2013-3.htm 

 

-----------------------------------------------

Exercise 3

Instructions:

-----------------------------------------------------------------

Part I

 

Start working on this part on March 7.

1. Construct the following clustering diagram. The dataset is DMAIL, in directory \Datasets\DATA_WM\ of ISQS6347 shared drive. You are requested to do clustering with this dataset and compare the results of the clustering model with those from the classification models you have developed in the homework. For information about SOM/Kohonen model check Section 7.2 in SAS courses notes ADMT_001.pdf and slide #73-79 of DM10-PatternDiscovery.pptx.

 

 

2. Node configurations

1) Input Source node - DMAIL:

a. Set ResponseFlag and TotalSpent as “Rejected”

b. Make sure the type of ProspectID is ID

2) Cluster node: Standardization, 8 clusters

3) SOM/Kohonen node: Column = 2, Row = 4, Standardization

3. Run both nodes

Problem 1: Based on your understanding, use one sentence to explain why we exclude ResponseFlag and TotalSpent in clustering.

Problem 2: Compare the results from two nodes. Report anything you believe significant enough.

4. Run the Segment Profile nodes. 

Problem 3: Compare the results from the two Segment Profile nodes and report any significant findings.

5. Study the distributions of ResponseFlag and TotalSpent regarding _SEGMENT_.

Problem 4: Compare the outcomes with the results from the decision tree as you have done in Homework Assignment 2.

6. Problem 5: Write a paragraph to summarize the above findings. Questions: How well the clustering outcomes could capture the target variable in the classification model?

-----------------------------------------

Part II

Start this part on March 21, 2013

 

Explore the dataset BANK and work out the following analyses:

1)     Associate analysis (Association)

2)     Sequence analysis (Association (2))

3)     Market Basket

4)     Path analysis

ZI[0YVXAPDY`~_G8}HE${UE

 

 

The report will include the results as follows:

  1. Identify three most frequently accessed bank services.
  2. List the three most frequently happened bank services access sequence patterns
  3. Present two charts with regard to the relationship between support and confidence. You need to explain the information contained in the charts.
  4. Draw a 3-D chart of support, confidence and lift.
  5. Two findings from Market Basket.
  6. Two findings from Path Analysis.

---------------------------------------
Part III: Optional

Note: Before you work on the second subquestion of question 1, you need to know Apriori Principle, which tells you how to reduce the number of itemsets to improve the efficiency of association analysis. Check Slide #110 in DM10-PatternDicovery.ppt.

 

1.       Use W, L and C to represent items “Watch Promo”, “Life Ins Promo”, and “Credit Card Ins”.

1)      List all itemsets generated from items {W, L, C}.

2)      If (W, C) is not a frequent itemset, which itemsets are also infrequent? 

3)      Based on the following table (a) identify a rule that has the highest support, (b) draw the contingence table for the rule, and (c) calculate its support, confidence and lift.

 

ID

Watch Promo (W)

Life Ins Promo (L)

Credit Card Ins. (C)

1

Yes

Yes

No

2

Yes

Yes

Yes

3

Yes

Yes

No

4

No

Yes

Yes

5

Yes

No

Yes

 

2.       Add in the negation of “Credit Card Ins” as an item, denoted as ~C. Based on the above table, find an association rules that contain ~C in the antecedent (left-hand side) and have the highest support. What are its confidence and lift?