ISQS 6347 Homework Assignments

 

(A hardcopy of submission is required for all assignments unless otherwise specified)

 

#

Assignments

6

Homework 6/Exercise 6 (due 04/28/2015, Tuesday)

TBD

(AAEM61 p.7-24, Exercises for Chapter 7.)

5

Homework 5/Exercise 5 (due 04/14/2015, Tuesday):

Textual data coding. Details will be proposed

4

Homework 4 (due 03/31/2015, Tuesday):

1) AAEM61 p.8-58 to 8-59, Exercises for Chapter 8 (clustering).

2)  AAEM61 p.8-78 to 8-79, Exercises for Chapter 8 (Association analysis).

Deliverables:

1)     The screenshots of the final results

2)     The screenshots demonstrating your specific work

3)     Your answers to the questions with blanks in the exercises

 

3

Homework 3 (due 03/03/2015, Tuesday):

1)     AEM61 p.4-82, Exercises for Chapter 4.

2)     AEM61 p.6-48, Exercises for Chapter 6.

It is good that you develop the solutions for each exercise before you can compare your results with the answer keys.

Deliverables:

4)     The screenshots of the final results

5)     The screenshots demonstrating your specific work

6)     Your answers to the questions with blanks in the exercises

 

2

Homework 2 (due 02/17/2015, Tuesday):

1)     Check Section 4.1 of “Effective Web Mining” (document name: CCWEB_TKIT.pdf, Page 4-1 to 4-34). Use dataset DMAIL (in the shared space under \Datasets\DATA_WM directory) to develop two decision tree models. One is basic without any parameter change, and another uses Gini splitting criterion. Then add an Assessment node to the diagram to compare the performance of two classification models. You don’t need to read the section in details since it is based on older version of SAS EM, but focus on: (1) the explanations of the variables, (2) which variable is the target, (3) which variables are configured (see p.4-12). You can also explore the dataset to understand its quality and variable distributions. You feel free to try different splitting criteria: Chi-Square, GINI, and Entropy, and different other parameters. If you more information about how to use SAS EM 5.3 to solve the problem, you can check Chapter 3 of AAEM61.

2)     AAEM61 p.3-111-112, Exercises for Chapter 3.

 

The deliverables include

a.     the model diagram,

b.     one of the Assessment charts,

c.     the performance table in the results of the Assessment node, and

d.     short explanations to each of the results.

 

1

Homework 1 (due 02/03/2015, Tuesday):

1)     Develop a decision tree manually using the credit card promotion data in the slide (the one with 15 observations). You need to choose one of variables as the target. Once the decision tree is done, pick up one rule that is explanatory enough to conceive a confusion matrix and indicate lift, coverage rate and accuracy rate.

2)     A dataset has 1000 records and 50 variables with 5% of value missing, spread randomly throughout the records and variables. An analyst decides to remove records that have missing values. About how many records would you expect be removed?

3)     Consider the following three-class confusion matrix. The matrix shows the classification results of a supervised model that uses previous voting records to determine the political party affiliation (Republican, Democrat, or Independent) of members of the United States Senate.

 

 

Rep

Dem

Ind

Rep

42

2

1

Dem

5

40

3

Ind

0

3

4

a.     What percent of the instances were correctly classified?

b.     According to the confusion matrix, how many Democrats are in the Senate? How many republicans? How many Independents?

c.     How many Republicans were classified as belonging to the Democratic Party?

d.     How many Independents were classified as Republicans?

e.     What are the precision rates of the classification for each column?

f.      What are the coverage rates of the classification?

g.     What are values of FPs and FN? (Hints: split the matrix into three 2x2 matrices for Rep, Dem, and Ind respectively)

 

4)     Go through AAEM Chapter 2. Use SAS Enterprise Miner 6.1 to complete the exercise on p.2-62. Screenshot the results – a few that can explain your work is enough. You need to define a new library “AAEM61” using the dataset of aaem61, which has become available in the share directory.