ISQS 7342-001 Business Analytics

 

Instructor: Zhangxi Lin

 

======================================================

Home | Schedule | Records | Projects

======================================================

 

TTh 12:00-3:20p, BA 363 (Computer Lab)

--------------------------------------------------------------------------------------------------------------------------------------

Textbooks and Lecture Materials:

·         Decision Trees for Business Intelligence and Data Mining: Using SAS Enterprise Miner - (DT)

·         CRM Segmentation and Clustering Using SAS Enterprise Miner (CRM)

·         Decision Tree Modeling, SAS Course Notes (DMDT) *

·         Applied Analytics Using SAS® Enterprise MinerTM 5, SAS Course Notes (AAEM) *

·         Advanced Predictive Modeling Using SAS® Enterprise Miner, SAS Course Notes (PMADV) *

·         Building and Solving Optimization Models with SAS/OR® (OROPT) *

* Note: The electronic versions are available to registered students

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 1, 8/26/2008, Tuesday

 

Topic: Introduction

 

Reading assignments:

1)     DT-1

2)     DMDT-1

3)     Competing on Analytics,” by Thomas H. Davenport

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 2, 8/28/2008, Thursday

 

Topic: Decision Tree Review

1)     Decision tree review (How to split, when to stop, who to prune)

2)     Getting familiar with SAS EM 5.2

3)     Hands-on: Decision tree modeling with INSURANCE dataset

Reading assignments:

1)     DT-1

2)     AAEM-1

3)     DMDT-1

4)     What’s New in SAS® Enterprise Miner™ 5.2,” SAS

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 3, 9/2/2008, Tuesday

 

Topic: Descriptive, Predictive, and Explanatory Analyses

1)     An evolutionary view of decision tree algorithms

2)     Comparison among decision tree algorithms

Reading assignments:

1)     DT-2

2)     AAEM-2

3)     Tree-Based Models: Identification of Influential factors under Condition of Instability,” SAS SUGI2002 best paper

References:

1)     CHAID Analysis, http://www.statsoft.com/textbook/stchaid.html

2)     QUEST, http://www.stat.wisc.edu/~loh/quest.html

3)     Comparison of different classification algorithms http://www.stat.wisc.edu/~loh/class.pdf

4)     “Tree Structured Data Analysis: AID, CHAID, and CART,” http://www.spss.com/research/wilkinson/publications/c&rtrees.pdf

5)     “Decision Trees for Predictive Modeling,” http://www.sasenterpriseminer.com/documents/Decision%20Trees%20for%20Predictive%20Modeling.pdf

6)     Colin R. Blyth, On Simpson's Paradox and the Sure-Thing Principle,” Journal of the American Statistical Association, Vol. 67, No. 338 (Jun., 1972), pp. 364-366.

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 4, 9/4/2008, Thursday

 

Topic: Recursive partitioning

1)     Hands-on: Exploring CENSUS2000 dataset

2)     The 7-step process of decision tree modeling

3)     Hands-on: Recursive partitioning (DMDT-2)

Reading assignments:

1)     DT-3

2)     DMDT-2

3)     A Decision Analysis Method for Evaluating Computer Intrusion Detection Systems,” Jacob W. Ulvila, John E. Gaffney, Jr., Decision Analysis, Volume Number: 1, Issues: Mar 

References:

1)     DTREG, http://www.dtreg.com/index.htm

2)     Handling missing data, http://people.cs.uu.nl/ad/pkdd99.pdf

 

Homework assignment 1 (due 9/16):

 

Use SAS EM 5.2 to mine HMEQ data set, following the instructions in Chapter 2 of course notes ADMT (used in ISQS 6347 class).

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 5, 9/9/2008, Tuesday

 

Topic: The Mechanism of DT Construction - Recursive partitioning

1)     Gini, Entropy and Chi-Square for decision tree modeling

2)     P-value adjustments

3)     Surrogate split

SAS Demonstration: HOUSING

 

Reading assignments: DT-3, DMDT-2

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 6, 9/11/2008, Thursday

 

Topic: Pruning

1)     Review of course structure

2)     Top-down vs. bottom-up pruning

3)     Prior probabilities

4)     Profit-weighted pruning

5)     Cross validation

SAS Demonstration: INSURANCE – pruning for profit, cross validation

 

Reading assignments: DMDT-3

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 7, 9/16/2008, Tuesday

 

Topic: Auxiliary use of tree

1)     Compare performance of different tree settings

2)     Look into the use of PROC Arbor.

3)     Input selection

4)     Interactive training

Reading assignments: DMDT-4

 

SAS demonstration: CUSTOMERS as the test data set; INSURANCE interactive splitting

 

Reference: What is Regression tree? http://www.stat.cmu.edu/~cshalizi/350-2006/lecture-10.pdf

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 8, 9/18/2008, Thursday

 

Topic: Ensembles of trees

1)     Complete the topics in the last lecture

2)     Forests

3)     Bagged tree models

 

Reading assignments: DMDT-5

 

SAS demonstration: INSURANCE

 

Homework assignment 2 (due 9/30):

 

1)     Go through the demonstrative SAS modeling cases in DMDT 4.4 and 4.5. Instructions:

  • Identify all decision tree applications in the model regarded as “auxiliary”. Draw a few sentences of comments on each of these instances, which will be your deliverables.
  • You will need to overcome some difficulties. The main problem is the instructions in the two sections are not accurate enough. So you need to read the previous section to get better understanding of the instructions by resuming the omitted information or the defaults that have been mentioned earlier.
  • Try to apply your understanding of the exercises to the presentation in the workshop. Therefore, you need to work on the homework as soon as possible.

2)     Work out the Bagging model in Section 5.2. Check the SAS code to understand how it works. (Optional)

 

Optional readings:

1)       Classification and Regression via Integer Optimization, Dimitris Bertsimas, Romy Shioda, Operations Research, Volume Number: 55, Issues: Mar-AprMotivated by the significant advances in integer optimization in the past decade, we introduce mixed-integer optimization methods to the classical statistical ...

2)       Data Mining by Decomposition: Adaptive Search for Hypothesis Generation, Hemant K. Bhargava, INFORMS Journal on Computing, Volume Number: 11, Issues: Summer, “SAS develops software for building Web-based applications for data management, statisti...

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 9 and 10, 9/23 & 9/25 2008

 

1.     Topics

Chapter 4. Business Intelligence and decision tree

1)     Decision tree and cube construction

2)     Decision tree and regression

 

Chapter 5. Theoretical issues in the decision tree growing process

1)     Insights and expositions from decision trees (two presenters)

2)     Multiple decision trees (two presenters)

Chapter 6. The integration of decision tree with other data mining approaches

1)     Decision trees in forecasting

2)     Decision tree in variable selection

3)     Decision trees in analytical model development

 

  1. Instructions:

1)     Each of the above topics will be presented by a student; a discussant for each topic will be assigned as well.

2)     Each presenter will budget up to 8 minutes to present and 5 minutes for discussion

3)     4-6 slides are required

4)     The presenter will mainly cover the topic based on the relevant chapter in the textbook

5)     In addition, the presenter must use examples or cases from other sources to support the theoretical contents in the textbook, which can be selected from the web, the SAS course notes, papers, or textbooks.

6)     The presenter needs to demonstrate the certain understanding of the topic, the ability to connect the topic to the knowledge gain so far, and the thorough preparation. However, if the presenter could follow the contents well, the presenter can clearly state the problem and incur class discussion. 

7)     The discussant need to prepare one or two meaningful questions for the presenter.

 

  1. Schedule:

No:

Date

Ref#

Title

1

9/23

4-1

Decision tree and cube construction

2

9/23

4-2

Decision tree and regression

3

9/23

5-1a

Insights and expositions from decision trees (1)

4

9/23

5-1b

Insights and expositions from decision trees (2)

5

9/25

5-2a

Multiple decision trees (1)

6

9/25

5-2b

Multiple decision trees (2)

7

9/25

6-1

Decision trees in forecasting

8

9/25

6-2

Decision tree in variable selection

9

9/25

6-3

Decision trees in analytical model development

 

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 11, 9/30/2008, Tuesday

 

Topic: Advanced Binary Prediction

1)     Data set introduction and basic model

2)     Improving input selection

SAS Demonstration: PVA_RAW_DATA – Basic modeling, Input selection

Reading assignments: PMADV-1

 

Review:

1)      Go through the demonstration cases in Section 1.1 and 1.2. They are not fully coved in the class but leave for your review.

2)      Review the contents of principal component analysis, using the follow up references.

References:

1)      Principal component analysis http://en.wikipedia.org/wiki/Principal_components_analysis

2)      A Tutorial of principal component analysis http://www.snl.salk.edu/~shlens/pub/notes/pca.pdf

3)      SAS Reading Material http://support.sas.com/publishing/pubcat/chaps/55129.pdf

4)      A lecture material in PowerPoint

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 12, 10/02/2008, Thursday

 

Topic: Advanced Binary Prediction

 

1)     Recoding

2)     Empirical Logits and model adequacy

SAS Demonstration: PVA_RAW_DATA – Variable clustering, All subset selection

Reading assignments: PMADV-1

 

Homework assignment 3 (due 10/15):

 

PMADV page A-6 to A-9. Choose at least 7 problems in Exercises 1 to 10. These are good exercises. It is good to everybody if you do all the problems. Printed submission is required.

 

Note: The model in the exercise is built with SAS EM 5.1, while the lab’s SAS EM is v5.2. The minor difference of these two versions may raise some challenges in modeling. Record what the problems and the approaches to solving the problem.

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 13, 10/07/2008, Tuesday

 

Topic: Continuous and Multiple Target Prediction

1)     1998 KDD-Cup results surprising

2)     Generalized profit matrices

3)     Basic two-stage model

SAS Demonstration: PVA_RAW_DATA

Reading assignments: PMADV-2

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 14, 10/09/2008, Thursday

 

Topic: Continuous and Multiple Target Prediction

 

1)     Improved two-stage model

2)     Profit variability

SAS Demonstration: PVA_RAW_DATA

Reading assignments: PMADV Chapter 3

 

Homework assignment 4 (due 10/30):

 

1)      PMADV page A-9. Exercises 11 to 13.

2)      A market segmentation model using clustering (To be determined)

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 15, 10/14/2008, Tuesday (Rescheduled to BA271, 1:-2:20p)

 

Topic: Prediction limits

 

1)     Profit variability

2)     Decision Trees Review

 

Reading assignments: PMADV Chapter 3

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 16, 10/16/2008, Thursday

 

Topic: Segmentation in CRM

 

1)     Segmentation in marketing

2)     Segment-based descriptive models

3)     Clustering review

Reading assignments: CRM Chapter 1-3

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 17, 10/21/2008, Tuesday

 

Topic: Clustering modeling

 

1)     Quiz 4

2)     RFM Cell-based clustering

Reading assignments: CRM Chapter 4

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 18, 10/23/2008, Thursday

 

Topic: Clustering applications

 

1)     Multi-attribute clustering

2)     Clustering of many attributes

Reading assignments: CRM Chapter 5, 6

 

Homework assignment 5 (due 11/11):

1)     Following the diagram of Figure 4.9 at page 70, complete the RFM clustering with (1) RFM cell based segmentation and (2) tree-based segmentation using RFM.

2)     Following the diagram of Figure 5.7 at page 84, add in a Segment Profile node and complete clustering modeling

3)     Following the lower diagram in Figure 5.21 at page 96, complete decision tree clustering modeling

4)     Read chapter 6 carefully. Use the input reduction techniques learned from the PMADV to explore the reduction of variables from data set NYTOWNS. You may focus on univariate and variable clustering. The book presents an example, but you may change the parameters to test the effects under different setting. The SAS code used in previously lectured model, Varclus variable selection.sas, can be used. However, it may not necessary work for a different dataset. Just try it and do whatever you can do, and report the results.

The above have been mostly covered and practiced in the classroom, but may not be completely done. They are good exercises for CRM segmentation. Submit your report with the above in a printed hardcopy.

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 19, 10/30/2008, Thursday

 

Topic: Advanced topics in CRM segmentation

 

1)     Text mining review

2)     Online text data collection – use of %tmfilter SAS macro (EM 4.3 vs. EM 5.2)

3)     Demonstration – Hockey data text mining (EM 4.3 vs. EM 5.2)

Reading assignments: CRM Chapter 12; Getting Started with SAS 9.1 Text Miner Chapter 6

 

References:

Memory-based reasoning:

1)     http://jp.fujitsu.com/group/labs/downloads/en/business/activities/activities-4/fujitsu-labs-bikm-001-en.pdf

2)     http://www.uv.mx/aguerra/publications/micai2000/micai2000.pdf

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 20, 11/04/2008, Tuesday

 

Topic: Text Mining with SAS EM 5.2

 

1)     Federalist papers

2)     SASPDF

 

Readings: DMTM9

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 21, 11/06/2008, Thursday

 

Topic: Introduction to Mathematical Optimization

 

1)     Quiz 5

2)     Simple example

3)     The OPTMODEL Procedure

Readings: OROPT-1

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 22, 11/11/2008, Tuesday

 

Topic: Linear Programming: Basics (1)

 

1)     Formulating and solving linear programming problems

2)     Reading data

Readings: OROPT-2

 

Optional readings:

1)     P. S. Bradley, Usama M. Fayyad, O. L. Mangasarian, “Mathematical Programming for Data Mining: Formulations and Challenges,” Journal: INFORMS Journal on Computing, Volume Number: 11, Issues: Summer.

This article is intended to serve as an overview of a rapidly emerging research and app...

2)     Syam Menon, Sumit Sarkar, “Minimizing Information Loss and Preserving Privacy,” Journal: Management Science, Volume Number: 53, Issues: One.

The need to hide sensitive information before sharing databases has long been recognized. In the context of data mining, sensitive information often takes the ...

 

Homework assignment 6 (due 11/25):

 

Solve the following Linear Programming problem:

 

Ed Butler is the manager for a company, which produces three types of spare parts for automobiles. The manufacture of each part requires processing on each of two machines, with the following process time (in hours):

 

Machines

Part

A

B

C

1

0.02

0.03

0.05

2

0.05

0.02

0.04

 

Each machine is available 40 hours per month. Each part manufactured ill yield a unit profit as follows:

 

Machines

Part

A

B

C

Profit

$50

$40

$30

 

Ed wants to determine the mix of spare parts to produce in order to maximize total profit.

(1)   Formulate a linear programming model for this problem

(2)   Write SAS code using OPTMODEL Procedure to solve the problem. Try to following the example in ORPOT Section 2.2 to code the program in three forms:

1.     Explicit form using the %let macro

2.     Arrays

3.     Index sets

(3)   Do the same as the above but read the data from a file and write the results to another file.

(4)   Report the results from normal simplex, primal simplex, and iterative interior point method with a short comment. Present one of the three programs.

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 23, 11/13/2008, Thursday

 

Topic: Linear Programming: Basics (2)

 

1)     Writing data

2)     Dual values

Readings: OROPT-2

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 24, 11/18/2008, Tuesday

 

Topic: Linear Programming: More topics

 

1)     Control flow and operations

2)     Model updates

 

Readings: OROPT-2, 3

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 25, 11/20/2008, Thursday

 

Topic: Introductory Integer and Mixed-Integer Linear Programming (1)

 

1)     Introduction

2)     Solving integer and mixed-integer linear problems using the OPTMODEL Procedure

 

Readings: OROPT-4

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 26, 11/25/2008, Tuesday

 

Topic: Introductory Integer and Mixed-Integer Linear Programming (2)

 

1)     Quiz 6

2)     ILP & MILP models using binary variables

 

Readings: OROPT-4

 

------- + ------ + ------- + ------- + ------- + -------

 

Thanksgiving break

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 27, 12/02/2008, Tuesday

 

Workshop 2

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 28, 12/03/2008, Wednesday

 

Workshop 2

 

------- + ------ + ------- + ------- + ------- + -------