ISQS 6347 Data & Text Mining Lecture Outlines

 

Instructor: Zhangxi Lin

 

======================================================

Home | Schedule | Records | Group Sign-up | View Groups | Projects

======================================================

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 21, Date: 04/30/07, Monday

 

Topic:  Quiz 7 and Exercise 7

 

Guideline

 

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 20, Date: 04/27/07, Friday

 

Topic:  Web-based Recommender Systems

 

Reading: Effective Web Mining: Attracting and Keeping Valued Cyber Consumers (CCWEB), Chapter 6

 

Slides: WebMining3

 

Lecture outline:

1)       Principal of online recommendation systems

2)       Project 2 Q/A

3)       Exercise 6 (Web mining)

 

Datasets: MOVIEBUY

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 19, Date: 04/25/07, Wednesday

 

Topic:  Web Mining

 

Reading: Effective Web Mining: Attracting and Keeping Valued Cyber Consumers (CCWEB), Chapter 5

Slides: WebMining2

 

Lecture outline:

1)       Quiz 6 review

2)       RLINKS dataset analysis review – SAS EG application, Link analysis, Association analysis

3)       Propensity-to-buy

4)       Banner ads (exploratory exercise)

5)       Online consumer profiling and segmentation

 

Datasets: PROPBUY, BANNER, BANNERAD, PROFILE

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 18, Date: 04/23/07, Monday

 

Topic:  Introduction to Web Mining

 

Reading: Effective Web Mining: Attracting and Keeping Valued Cyber Consumers (CCWEB), Chapter 1-4

 

Slides: WebMining1

 

Lecture outline:

1)       Quiz 6 (Text Mining)

2)       Introduction to Web Mining

3)       SAS Enterprise Guide v4.3

4)       Review of Text mining (Chapter 3 of TMUS)

 

Datasets: FSLINKS, RLINKS, Commrex web log

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 17, Date: 04/20/07, Friday

 

Topic:  Text Mining – Predictive Modeling

 

Reading: Text Mining Using SAS Software, Chapter 3

 

Slides: TM-4

 

Lecture outline:

1)       Model inputs derived from text mining

2)       Predictive modeling with text mining inputs – Insurance Subrogation

3)       Exercise 5 (Text Mining)

 

SAS Datasets: Insurance, Start list

 

Homework assignment 6 (Due on April 27, Friday):

1)       TMUS Chapter 2 Exercise

2)       Replicate the steps of insurance claim example in TMUS 3.3 (No need to report all outcomes except for (1) the model diagram, (2) the response chart from Assessment node). Why the outcomes from SVD based regression is better than the cluster ID based regression?

3)       Explain the configuration of SVD and roll-up term. What are the differences between them?

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 16, Date: 04/18/07, Wednesday

Topic:  Text Mining – Exploratory Analysis

Reading:

1)       Text Mining Using SAS Software, Chapter 2

 

Slides: TM-3

 

Online reading materials of Hidden Markov Model (HMM):

1)       http://jedlik.phy.bme.hu/~gerjanos/HMM/node2.html

2)       http://www.csse.monash.edu.au/~lloyd/tildeMML/Structured/HMM.html

3)       http://www.autonlab.org/tutorials/hmm14.pdf

 

Lecture outline:

1)       Quiz 5 Review

2)       Exploration using Text Miner – SASPDF

3)       Document clustering - AMAZON

4)       About hierarchical clustering

 

SAS Dataset: SASPDF.xls, STOPLIST, SASPDF, AMAZON, NEWS

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 15, Date: 04/16/07, Monday

Topic:  Text Mining - Preliminary

Reading:

1)       Text Mining Using SAS Software (TMUS), Chapter 1

2)       GSTM Ch5, pp31-36

Slides: TM-2

 

Online materials for Singular Value Decomposition (SVD):

1)       Basics of Matrix: http://www.xycoon.com/matrix_algebra.htm

2)       http://mathworld.wolfram.com/SingularValueDecomposition.html

3)       http://www.uwlax.edu/faculty/will/svd/

4)       http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm

5)       http://kwon3d.com/theory/jkinem/svd.html

 

Agenda:

1)       Quiz 5 (Association Analysis)

2)       Processing textual data

3)       Transformations

 

Review questions:

1)       What are main issues in converting unstructured text to structured data?

2)       How is the SVD approach applied to text mining?

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 14, Date: 04/13/07, Friday

Topic:  Association Analysis Demonstration and Preliminary Text Mining

Reading:

1)       CCWEB_TKIT Section 2.1, 2.2

2)       Getting Started with SAS Text Miner (GSTM) Ch1-4 (View PDF (947KB), pp1-30)

3)       RG Ch11 (pp342-343)

 

Slides: DM9, TM-1

 

Agenda:

1)       Exercise 3 review

2)       Dessociation analysis

3)       Application of association analysis in web mining

4)       Introduction to text mining

5)       Exercise 4 (Association Analysis)

 

Homework assignment 5 (Due on April 20, Friday):

1)       Associate analysis problem:

a.       Read Chapter 5 of Data Mining Using SAS Enterprise Miner: A Case Study Approach (DMCS) carefully

b.       Redo the associate analysis example ASSOCS (The dataset is in SAS EM library SAMPSIO. All the datasets used in this course notes are in SAMPSIO. Some may have different names. For example, DMWEB in the book becomes WEBPATH in SAS EM library SAMPSIO.).

c.       Report the outcomes

2)       Exploring the example ABSTRACT in GSTM.  Answer the following questions

a.       What are the differences when not using the request stop list SUGISTOP but the default STOPLST?

b.       What are the differences if unchecking “Stemmed words as root form”?

c.       What are the differences between using Hierarchical and Expectation maximization for the configuration of Text Miner node?

 

----- + ------ + ------- + ------- + ------- + -------

 

Lecture 13, Date: 04/11/07, Wednesday

Topic:  Association Analysis

Reading:

4)       RG Chapter 3 (pp78-84)

5)       ADMT Ch 8.1

 

Slides: DM9

 

Agenda:

6)       Quiz 4 (Clustering)

7)       Homework 3 review – classification with BUY dataset

8)       Basic concepts of association analysis

 

Review questions: 

1)       What are differences between the datasets used for association analysis and the ones for clustering or classification?

2)       Does the order of the items in an itemset matter?

 

------ + ------ + ------- + ------- + ------- + -------

 

Lecture 12, Date: 04/06/07, Friday

 

Topic:  Association Analysis

Reading:

6)       RG Chapter 3 (pp78-84)

7)       ADMT Ch 8.1

 

Slides: DM9

 

Agenda:

9)       Basic concepts of association analysis

10)   Homework 2 project – Classification modeling using dataset DMAIL. How to optimize the results.

11)   Exercise 3 (Clustering)

 

Review questions: 

3)       RG p102, review question 2.

4)       Use the clustering worksheet (http://zlin.ba.ttu.edu/6347/Clustering.xls ) to explore different outcomes of clustering. Modify the coordinates of the instances to obtain different datasets. Check the outcomes. Selectively record the k-mean clustering iterations for 2 different sets of instances including the illustrative charts.

 

Homework assignment 4 (Due on April 13, Friday):

 

1)       RG p103, Computational Questions: 10 (feel free to use the clustering worksheet http://zlin.ba.ttu.edu/6347/Clustering.xls )

2)       Use clustering approach to analyze dataset S3358 (ISQS 3358 student survey data) in shared directory under \Other_Data subdirectory. Report a few findings with selected screenshots of the representative results. The following are a few questions that could interest the instructor:

1.       How many groups should the students be divided? Why?

2.       What are the characteristics of each group?

3.       Which factors are more important in cluster the students

 

------ + ------ + ------- + ------- + ------- + -------

 

Lecture 11, Date: 04/04/07, Wednesday

 

Topic:  Implementing Clustering

Reading:

1)       Applying Data Mining Techniques Using Enterprise Miner (ADMT), Chapter 7

2)       Data Mining Using SAS Enterprise Miner: A Case Study Approach (DMCS), Chapter 4

3)       RG Chapter 10 (Section 10.4)

4)       Hierarchical clustering, http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/hierarchical.html, http://www.resample.com/xlminer/help/HClst/HClst_intro.htm

5)       Ward algorithm http://www.chemaxon.com/jchem/doc/user/Ward.html

 

Slides: DM8

 

Agenda:

1)       Clustering analysis with SAS Enterprise Miner

2)       Self-organizing map – a neural network method

 

Review questions:

1)       RG p102 Review question 5

2)       RG p323 Review question 3

 

------ + ------ + ------- + ------- + ------- + -------

 

Lecture 10, Date: 04/02/07, Monday

 

Topic:  Introduction to Clustering

Reading:

1)       Applying Data Mining Techniques Using Enterprise Miner (ADMT), Chapter 7

2)       RG Chapter 3 (Section 3.3)

3)       Clustering: An Introduction, http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/index.html

4)       K-mean clustering tutorial, http://people.revoledu.com/kardi/tutorial/kMean/index.html

 

Slides: DM8

Other: Clustering demon

 

Agenda:

1)       Quiz 3 (Classification)

2)       Principle of clustering

 

Review questions:

1)       What are main difference between clustering and classification data mining?

2)       Check your datasets that you found for Homework 1. Do they fit clustering or classification method?

3)       RG p102, Review questions 3.

 

------ + ------ + ------- + ------- + ------- + -------

 

Lecture 9, Date: 03/30/07, Friday

 

Topic:  Neural Network for classification

Reading:

1)       Applying Data Mining Techniques Using Enterprise Miner (ADMT), Chapter 5

2)       RG Chapter 8

Slides: DM7

 

Agenda:

1)       Principle of neural network for data mining

2)       Home equity loan decision – Neural network

 

Review qu