ISQS 7342-001 Business Analytics
Instructor:
======================================================
Home | Schedule | Records | Projects
======================================================
TTh 12:00-3:20p, BA 363 (Computer Lab)
--------------------------------------------------------------------------------------------------------------------------------------
Textbooks
and Lecture Materials:
·
Decision Trees for Business Intelligence and Data Mining: Using SAS
Enterprise Miner - (DT)
·
·
Decision Tree
Modeling, SAS
Course Notes (DMDT) *
·
Applied
Analytics Using SAS®
Enterprise MinerTM 5, SAS Course Notes
(AAEM) *
·
Advanced
Predictive Modeling Using SAS® Enterprise Miner, SAS Course Notes (PMADV) *
·
Building and
Solving Optimization Models with SAS/OR® (OROPT) *
* Note: The electronic versions are available to
registered students
------- + ------ +
------- + ------- + ------- + -------
Lecture 1, 8/26/2008,
Tuesday
Topic: Introduction
Reading assignments:
1) DT-1
2) DMDT-1
3) “Competing on
Analytics,” by Thomas H. Davenport
------- + ------ +
------- + ------- + ------- + -------
Lecture 2, 8/28/2008,
Thursday
Topic: Decision Tree Review
1)
Decision
tree review (How to split, when to stop, who to prune)
2)
Getting
familiar with SAS EM 5.2
3)
Hands-on:
Decision tree modeling with INSURANCE dataset
Reading assignments:
1) DT-1
2) AAEM-1
3) DMDT-1
4) “What’s
New in SAS® Enterprise Miner™ 5.2,” SAS
------- + ------ +
------- + ------- + ------- + -------
Lecture 3, 9/2/2008,
Tuesday
Topic: Descriptive,
Predictive, and Explanatory Analyses
1)
An
evolutionary view of decision tree algorithms
2)
Comparison
among decision tree algorithms
Reading assignments:
1)
DT-2
2)
AAEM-2
3)
“Tree-Based Models:
Identification of Influential factors under Condition of Instability,” SAS
SUGI2002 best paper
References:
1)
CHAID
Analysis, http://www.statsoft.com/textbook/stchaid.html
2)
QUEST,
http://www.stat.wisc.edu/~loh/quest.html
3)
Comparison
of different classification algorithms http://www.stat.wisc.edu/~loh/class.pdf
4)
“Tree
Structured Data Analysis: AID, CHAID, and CART,” http://www.spss.com/research/wilkinson/publications/c&rtrees.pdf
5)
“Decision
Trees for Predictive Modeling,” http://www.sasenterpriseminer.com/documents/Decision%20Trees%20for%20Predictive%20Modeling.pdf
6)
Colin
R. Blyth, “On Simpson's Paradox and
the Sure-Thing Principle,” Journal of the American
Statistical Association, Vol. 67, No. 338 (Jun., 1972), pp. 364-366.
------- + ------ +
------- + ------- + ------- + -------
Lecture 4, 9/4/2008,
Thursday
Topic: Recursive
partitioning
1)
Hands-on:
Exploring CENSUS2000 dataset
2)
The
7-step process of decision tree modeling
3)
Hands-on:
Recursive partitioning (DMDT-2)
Reading assignments:
1) DT-3
2) DMDT-2
3) “A Decision Analysis Method for Evaluating Computer
Intrusion Detection Systems,” Jacob W. Ulvila, John
E. Gaffney, Jr., Decision Analysis, Volume Number: 1, Issues: Mar
References:
1)
DTREG,
http://www.dtreg.com/index.htm
2)
Handling
missing data, http://people.cs.uu.nl/ad/pkdd99.pdf
Homework
assignment 1 (due 9/16):
Use
SAS EM 5.2 to mine HMEQ data set, following the instructions in Chapter 2 of course notes ADMT (used in ISQS 6347
class).
------- + ------ +
------- + ------- + ------- + -------
Lecture 5, 9/9/2008,
Tuesday
Topic: The Mechanism of DT
Construction - Recursive partitioning
1)
Gini, Entropy and Chi-Square for decision
tree modeling
2)
P-value
adjustments
3)
Surrogate
split
SAS
Demonstration: HOUSING
Reading assignments: DT-3, DMDT-2
------- + ------ +
------- + ------- + ------- + -------
Lecture 6, 9/11/2008,
Thursday
Topic: Pruning
1)
Review
of course structure
2)
Top-down
vs. bottom-up pruning
3)
Prior
probabilities
4)
Profit-weighted
pruning
5)
Cross
validation
SAS Demonstration:
INSURANCE –
pruning for profit, cross validation
Reading assignments: DMDT-3
------- + ------ +
------- + ------- + ------- + -------
Lecture 7, 9/16/2008,
Tuesday
Topic: Auxiliary use of tree
1)
Compare
performance of different tree settings
2)
Look
into the use of PROC Arbor.
3)
Input
selection
4)
Interactive
training
Reading assignments: DMDT-4
SAS demonstration: CUSTOMERS as the
test data set; INSURANCE interactive splitting
Reference: What is Regression
tree? http://www.stat.cmu.edu/~cshalizi/350-2006/lecture-10.pdf
------- + ------ +
------- + ------- + ------- + -------
Lecture 8, 9/18/2008,
Thursday
Topic: Ensembles of trees
1)
Complete
the topics in the last lecture
2)
Forests
3)
Bagged
tree models
Reading assignments: DMDT-5
SAS demonstration: INSURANCE
Homework
assignment 2 (due 9/30):
1)
Go
through the demonstrative SAS modeling cases in DMDT 4.4 and 4.5. Instructions:
2)
Work
out the Bagging model in Section 5.2. Check the SAS code to understand how it
works. (Optional)
Optional readings:
1)
Classification
and Regression via Integer Optimization, Dimitris Bertsimas, Romy
Shioda, Operations Research, Volume Number: 55,
Issues: Mar-Apr “Motivated
by the significant advances in integer optimization in the past decade, we
introduce mixed-integer optimization methods to the classical statistical ...”
2)
Data
Mining by Decomposition: Adaptive Search for Hypothesis Generation, Hemant K. Bhargava, INFORMS Journal on
Computing, Volume Number: 11, Issues: Summer, “SAS develops software for building Web-based
applications for data management, statisti...”
------- + ------ +
------- + ------- + ------- + -------
Lecture 9 and 10,
9/23 & 9/25 2008
1.
Topics
Chapter 4. Business
Intelligence and decision tree
1)
Decision tree and cube construction
2)
Decision tree and regression
Chapter 5. Theoretical issues
in the decision tree growing process
1) Insights
and expositions from decision trees (two presenters)
2) Multiple
decision trees (two presenters)
Chapter
6. The integration of decision tree with other data mining
approaches
1)
Decision trees in forecasting
2)
Decision tree in variable selection
3)
Decision trees in analytical model
development
1)
Each of the above topics will be presented by
a student; a discussant for each topic will be assigned as well.
2)
Each presenter will budget up to 8 minutes to
present and 5 minutes for discussion
3)
4-6 slides are required
4)
The presenter will mainly cover the topic
based on the relevant chapter in the textbook
5)
In addition, the presenter must use examples
or cases from other sources to support the theoretical contents in the
textbook, which can be selected from the web, the SAS course notes, papers, or
textbooks.
6)
The presenter needs to demonstrate the
certain understanding of the topic, the ability to connect the topic to the
knowledge gain so far, and the thorough preparation. However, if the presenter could
follow the contents well, the presenter can clearly state the problem and incur
class discussion.
7)
The discussant need to prepare one or two
meaningful questions for the presenter.
|
No: |
Date |
Ref# |
Title |
|
1 |
9/23 |
4-1 |
Decision
tree and cube construction |
|
2 |
9/23 |
4-2 |
Decision
tree and regression |
|
3 |
9/23 |
5-1a |
Insights
and expositions from decision trees (1) |
|
4 |
9/23 |
5-1b |
Insights
and expositions from decision trees (2) |
|
5 |
9/25 |
5-2a |
Multiple
decision trees (1) |
|
6 |
9/25 |
5-2b |
Multiple
decision trees (2) |
|
7 |
9/25 |
6-1 |
Decision
trees in forecasting |
|
8 |
9/25 |
6-2 |
Decision
tree in variable selection |
|
9 |
9/25 |
6-3 |
Decision
trees in analytical model development |
------- + ------ +
------- + ------- + ------- + -------
Lecture 11,
9/30/2008, Tuesday
Topic: Advanced Binary
Prediction
1)
Data
set introduction and basic model
2)
Improving
input selection
SAS Demonstration:
PVA_RAW_DATA
– Basic modeling, Input selection
Reading assignments: PMADV-1
Review:
1) Go through the demonstration cases in Section 1.1 and 1.2. They are not fully coved in the class but leave for your review.
2) Review the contents of principal component analysis, using the follow up references.
References:
1) Principal component analysis http://en.wikipedia.org/wiki/Principal_components_analysis
2) A Tutorial of principal component analysis http://www.snl.salk.edu/~shlens/pub/notes/pca.pdf
3) SAS Reading Material http://support.sas.com/publishing/pubcat/chaps/55129.pdf
4) A lecture material in PowerPoint
------- + ------ +
------- + ------- + ------- + -------
Lecture 12,
10/02/2008, Thursday
Topic: Advanced Binary
Prediction
1)
Recoding
2)
Empirical
Logits and model adequacy
SAS Demonstration:
PVA_RAW_DATA
– Variable clustering, All subset selection
Reading assignments: PMADV-1
Homework
assignment 3 (due 10/15):
PMADV page A-6 to A-9. Choose at least 7
problems in Exercises 1 to 10. These are good exercises. It is good to everybody
if you do all the problems. Printed submission is required.
Note:
The model in the exercise is built with SAS EM 5.1, while the lab’s SAS EM is
v5.2. The minor difference of these two versions may raise some challenges in
modeling. Record what the problems and the approaches to solving the problem.
------- + ------ +
------- + ------- + ------- + -------
Lecture 13,
10/07/2008, Tuesday
Topic: Continuous and
Multiple Target Prediction
1)
1998
KDD-Cup results surprising
2)
Generalized
profit matrices
3)
Basic
two-stage model
SAS Demonstration:
PVA_RAW_DATA
Reading assignments: PMADV-2
------- + ------ +
------- + ------- + ------- + -------
Lecture 14,
10/09/2008, Thursday
Topic: Continuous and
Multiple Target Prediction
1)
Improved
two-stage model
2)
Profit
variability
SAS Demonstration:
PVA_RAW_DATA
Reading assignments: PMADV Chapter 3
Homework
assignment 4 (due 10/30):
1) PMADV page A-9. Exercises 11 to 13.
2) A market segmentation model using clustering (To be determined)
------- + ------ +
------- + ------- + ------- + -------
Lecture 15,
10/14/2008, Tuesday (Rescheduled to BA271, 1:-2:20p)
Topic: Prediction limits
1)
Profit
variability
2)
Decision
Trees Review
Reading assignments: PMADV Chapter 3
------- + ------ +
------- + ------- + ------- + -------
Lecture 16,
10/16/2008, Thursday
Topic: Segmentation in
1)
Segmentation
in marketing
2)
Segment-based
descriptive models
3)
Clustering
review
Reading assignments: CRM Chapter 1-3
------- + ------ +
------- + ------- + ------- + -------
Lecture 17,
10/21/2008, Tuesday
Topic: Clustering modeling
1)
Quiz
4
2)
RFM
Cell-based clustering
Reading assignments: CRM Chapter 4
------- + ------ +
------- + ------- + ------- + -------
Lecture 18,
10/23/2008, Thursday
Topic: Clustering
applications
1)
Multi-attribute
clustering
2)
Clustering
of many attributes
Reading assignments: CRM Chapter 5, 6
Homework
assignment 5 (due 11/11):
1)
Following
the diagram of Figure 4.9 at page 70, complete the RFM clustering with (1) RFM
cell based segmentation and (2) tree-based segmentation using RFM.
2)
Following
the diagram of Figure 5.7 at page 84, add in a Segment Profile node and
complete clustering modeling
3)
Following
the lower diagram in Figure 5.21 at page 96, complete decision tree clustering
modeling
4)
Read
chapter 6 carefully. Use the input reduction techniques learned from the PMADV
to explore the reduction of variables from data set NYTOWNS. You may focus on univariate and variable clustering. The book presents an
example, but you may change the parameters to test the effects under different
setting. The SAS code used in previously lectured model, Varclus
variable selection.sas, can be used. However, it may not necessary work for a
different dataset. Just try it and do whatever you can do, and report the
results.
The above have been mostly covered and
practiced in the classroom, but may not be completely done. They are good
exercises for CRM segmentation. Submit your report with the above in a printed
hardcopy.
------- + ------ +
------- + ------- + ------- + -------
Lecture 19,
10/30/2008, Thursday
Topic: Advanced topics in
CRM segmentation
1)
Text
mining review
2)
Online
text data collection – use of %tmfilter SAS macro (EM
4.3 vs. EM 5.2)
3)
Demonstration
– Hockey data text mining (EM 4.3 vs. EM 5.2)
Reading assignments: CRM Chapter 12;
Getting Started with SAS 9.1 Text Miner Chapter 6
References:
Memory-based
reasoning:
2)
http://www.uv.mx/aguerra/publications/micai2000/micai2000.pdf
------- + ------ +
------- + ------- + ------- + -------
Lecture 20,
11/04/2008, Tuesday
Topic: Text Mining with SAS
EM 5.2
1)
Federalist
papers
2)
SASPDF
------- + ------ +
------- + ------- + ------- + -------
Lecture 21,
11/06/2008, Thursday
Topic: Introduction to
Mathematical Optimization
1)
Quiz
5
2)
Simple
example
3)
The
OPTMODEL Procedure
------- + ------ +
------- + ------- + ------- + -------
Lecture 22,
11/11/2008, Tuesday
Topic: Linear Programming:
Basics (1)
1)
Formulating
and solving linear programming problems
2)
Optional readings:
1)
P.
S. Bradley, Usama M. Fayyad, O. L. Mangasarian, “Mathematical Programming for Data Mining:
Formulations and Challenges,” Journal: INFORMS Journal on Computing, Volume
Number: 11, Issues: Summer.
This article is
intended to serve as an overview of a rapidly emerging research and app...
2)
Syam Menon, Sumit Sarkar, “Minimizing
Information Loss and Preserving Privacy,” Journal: Management Science, Volume
Number: 53, Issues: One.
The need to hide sensitive information before sharing
databases has long been recognized. In the context of data mining, sensitive
information often takes the ...
Homework
assignment 6 (due 11/25):
Solve
the following Linear Programming problem:
Ed
Butler is the manager for a company, which produces three types of spare parts
for automobiles. The manufacture of each part requires processing on each of
two machines, with the following process time (in hours):
|
Machines |
Part |
||
|
A |
B |
C |
|
|
1 |
0.02 |
0.03 |
0.05 |
|
2 |
0.05 |
0.02 |
0.04 |
Each
machine is available 40 hours per month. Each part manufactured ill yield a
unit profit as follows:
|
Machines |
Part |
||
|
A |
B |
C |
|
|
Profit |
$50 |
$40 |
$30 |
Ed
wants to determine the mix of spare parts to produce in order to maximize total
profit.
(1)
Formulate
a linear programming model for this problem
(2)
Write
SAS code using OPTMODEL Procedure to solve the problem. Try to following the
example in ORPOT Section 2.2 to code the program in three forms:
1.
Explicit
form using the %let macro
2.
Arrays
3.
Index
sets
(3)
Do
the same as the above but read the data from a file and write the results to
another file.
(4)
Report
the results from normal simplex, primal simplex, and iterative interior point
method with a short comment. Present one of the three programs.
------- + ------ +
------- + ------- + ------- + -------
Lecture 23,
11/13/2008, Thursday
Topic: Linear Programming:
Basics (2)
1)
Writing
data
2)
Dual
values
------- + ------ +
------- + ------- + ------- + -------
Lecture 24,
11/18/2008, Tuesday
Topic: Linear Programming:
More topics
1)
Control flow and operations
2)
Model updates
------- + ------ +
------- + ------- + ------- + -------
Lecture 25,
11/20/2008, Thursday
Topic: Introductory Integer
and Mixed-Integer Linear Programming (1)
1)
Introduction
2)
Solving
integer and mixed-integer linear problems using the OPTMODEL Procedure
------- + ------ +
------- + ------- + ------- + -------
Lecture 26,
11/25/2008, Tuesday
Topic: Introductory Integer
and Mixed-Integer Linear Programming (2)
1)
Quiz
6
2)
ILP
& MILP models using binary variables
------- + ------ +
------- + ------- + ------- + -------
Thanksgiving
break
------- + ------ +
------- + ------- + ------- + -------
Lecture 27,
12/02/2008, Tuesday
Workshop 2
------- + ------ +
------- + ------- + ------- + -------
Lecture 28, 12/03/2008,
Wednesday
Workshop 2
------- + ------ + ------- + ------- + ------- + -------