ISQS 6347 Homework Assignments
(A
hardcopy of submission is required for all assignments unless otherwise
specified)
# |
Assignments |
||||||||||||||||
5 |
Homework 5 (due 11p, 04/11/2011, Monday): 1) DMTM9 p1-54, question 1 & 2 2) DMTM9 p1-61, question 3 |
||||||||||||||||
4 |
Homework
4 (due 3/28/2011, Monday): 1) AAEM61 p.8-58 to 8-59, Exercises for Chapter 8
(clustering). 2) AAEM61 p.8-78 to 8-79, Exercises for Chapter 8
(Association analysis). |
||||||||||||||||
3 |
Homework
3 (due 2/28/2011, Monday): 1)
AAEM61
p.4-82, Exercises for Chapter 4. 2)
AAEM61
p.5-55, Exercises for Chapter 5. 3)
AAEM61
p.6-48, Exercises for Chapter 6. 4)
AAEM61
p.7-24, Exercises for Chapter 7. It is good that you have the solutions
right after each exercise. Then you can compare your results with the answer
keys. Deliverables: 1)
The
screenshots of the final results 2)
The
screenshots demonstrating your specific work 3)
Your
answers to the questions with blanks in the exercises |
||||||||||||||||
2 |
Homework
2 (due 02/14/2011, Monday): 1)
Check
Section 4.1 of “Effective Web Mining” (document name: CCWEB_TKIT.pdf, Page
4-1 to 4-34). Use dataset DMAIL (in the shared space under \Datasets\DATA_WM
directory) to develop two decision tree models. One is basic without any
parameter change, and another uses Gini splitting
criterion. Then add an Assessment node to the diagram to compare the
performance of two classification models. You don’t need to read the section in details since it is based on
older version of SAS EM, but focus on: (1) the explanations of the variables,
(2) which variable is the target, (3) which variables are configured (see
p.4-12). You can also explore the dataset to understand its quality and
variable distributions. You feel free to try different splitting
criteria: Chi-Square, GINI, and Entropy, and different other parameters. If
you more information about how to use SAS EM 5.3 to solve the problem, you
can check Chapter 3 of AAEM61. 2)
Construct
a logistic regression model for the same dataset. Compare the results with
that from the decision tree model 3)
AAEM61
p.3-111-112, Exercises for Chapter 3. The deliverables include a.
the
model diagram, b.
one
of the Assessment charts, c.
the
performance table in the results of the Assessment node, and d.
short explanations to each of the results. |
||||||||||||||||
1 |
Homework
1 (due 01/31/2011, Monday): 1)
Develop
a decision tree manually using the credit card promotion data in the slide
(the one with 15 observations). You need to choose one of variables as the
target. Once the decision tree is done, pick up one rule that is explanatory
enough to conceive a confusion matrix and indicate lift, coverage rate and
accuracy rate. 2)
A
dataset has 1000 records and 50 variables with 5% of value missing, spread
randomly throughout the records and variables. An analyst decides to remove
records that have missing values. About how many records would you expect be
removed? 3)
Consider
the following three-class confusion matrix. The matrix shows the
classification results of a supervised model that uses previous voting
records to determine the political party affiliation (Republican, Democrat,
or Independent) of members of the United States Senate.
a.
What
percent of the instances were correctly classified? b.
According
to the confusion matrix, how many Democrats are in the Senate? How many
republicans? How many Independents? c.
How
many Republicans were classified as belonging to the Democratic Party? d.
How
many Independents were classified as Republicans? e.
What
are the accuracy rates of the classification for each column? f.
What
are the coverage rates of the classification? g.
What
are values of FPs and FN? (Hints: split the matrix into three 2x2 matrices
for Rep, Dem, and 4)
Go
through AAEM Chapter 2. Use SAS Enterprise Miner 6.1 to complete the exercise
on p.2-62. Screenshot the results – a few that can explain your work is
enough. You need to define a new library “AAEM61” using the dataset of
aaem61, which has become available in the share directory. |