ISQS 6347 Data & Text Mining Lecture Outlines
Instructor:
======================================================
Home |
Schedule | Records | Group Sign-up | View Groups | Projects
======================================================
------- + ------ + ------- + -------
+ ------- + -------
Lecture 21, Date: 04/30/07, Monday
Topic: Quiz 7 and Exercise 7
------- + ------ + ------- + -------
+ ------- + -------
Lecture 20, Date: 04/27/07, Friday
Topic: Web-based Recommender Systems
Slides: WebMining3
Lecture
outline:
1) Principal of online recommendation
systems
2) Project 2 Q/A
3) Exercise 6 (Web mining)
Datasets:
MOVIEBUY
------- + ------ + ------- + -------
+ ------- + -------
Lecture 19, Date: 04/25/07,
Wednesday
Topic: Web Mining
Slides: WebMining2
Lecture
outline:
1) Quiz 6 review
2) RLINKS dataset analysis review – SAS
EG application, Link analysis, Association analysis
3) Propensity-to-buy
4) Banner ads (exploratory exercise)
5) Online consumer profiling and
segmentation
Datasets:
PROPBUY, BANNER, BANNERAD, PROFILE
------- + ------ + ------- + -------
+ ------- + -------
Lecture 18, Date: 04/23/07, Monday
Topic: Introduction to Web Mining
Slides: WebMining1
Lecture
outline:
1) Quiz 6 (Text Mining)
2) Introduction to Web Mining
3) SAS Enterprise Guide v4.3
4) Review of Text mining (Chapter 3 of
TMUS)
Datasets:
FSLINKS, RLINKS, Commrex web log
------- + ------ + ------- + -------
+ ------- + -------
Lecture 17, Date: 04/20/07, Friday
Topic: Text Mining – Predictive Modeling
Slides: TM-4
Lecture
outline:
1) Model inputs derived from text
mining
2) Predictive modeling with text mining
inputs – Insurance Subrogation
3) Exercise 5 (Text Mining)
SAS
Datasets: Insurance, Start
list
Homework assignment 6
(Due on April 27, Friday):
1) TMUS Chapter 2 Exercise
2) Replicate the steps of insurance
claim example in TMUS 3.3 (No need to report all outcomes except for (1) the
model diagram, (2) the response chart from Assessment node). Why the outcomes
from SVD based regression is better than the cluster ID based regression?
3) Explain the configuration of SVD and
roll-up term. What are the differences between them?
------- + ------ + ------- + -------
+ ------- + -------
Lecture 16, Date: 04/18/07,
Wednesday
Topic: Text Mining – Exploratory Analysis
1) Text Mining Using SAS Software,
Chapter 2
Slides: TM-3
Online
reading materials of Hidden Markov Model (HMM):
1) http://jedlik.phy.bme.hu/~gerjanos/HMM/node2.html
2) http://www.csse.monash.edu.au/~lloyd/tildeMML/Structured/HMM.html
3) http://www.autonlab.org/tutorials/hmm14.pdf
Lecture
outline:
1) Quiz 5 Review
2) Exploration using Text Miner –
SASPDF
3) Document clustering - AMAZON
4) About hierarchical
clustering
SAS
Dataset: SASPDF.xls, STOPLIST, SASPDF, AMAZON, NEWS
------- + ------ + ------- + -------
+ ------- + -------
Lecture 15, Date: 04/16/07, Monday
Topic: Text Mining - Preliminary
1) Text Mining Using SAS Software
(TMUS), Chapter 1
2) GSTM Ch5, pp31-36
Slides: TM-2
Online materials
for Singular Value Decomposition (SVD):
1) Basics of Matrix: http://www.xycoon.com/matrix_algebra.htm
2) http://mathworld.wolfram.com/SingularValueDecomposition.html
3) http://www.uwlax.edu/faculty/will/svd/
4) http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm
5) http://kwon3d.com/theory/jkinem/svd.html
Agenda:
1) Quiz 5 (Association Analysis)
2) Processing textual data
3) Transformations
Review
questions:
1) What are main issues in converting
unstructured text to structured data?
2) How is the SVD approach applied to
text mining?
------- + ------ + ------- + -------
+ ------- + -------
Lecture 14, Date:
Topic: Association Analysis Demonstration and
Preliminary Text Mining
1) CCWEB_TKIT Section 2.1, 2.2
2) Getting Started with SAS Text Miner
(GSTM) Ch1-4 (View PDF (947KB), pp1-30)
3) RG Ch11 (pp342-343)
Agenda:
1) Exercise 3 review
2) Dessociation analysis
3) Application of association analysis
in web mining
4) Introduction to text mining
5) Exercise 4
(Association Analysis)
Homework assignment 5
(Due on April 20, Friday):
1) Associate analysis problem:
a. Read Chapter 5 of Data Mining Using SAS Enterprise
Miner: A Case Study Approach (DMCS) carefully
b. Redo the associate analysis example
ASSOCS (The dataset is in SAS EM library SAMPSIO. All the datasets used in this
course notes are in SAMPSIO. Some may have different names. For example, DMWEB
in the book becomes WEBPATH in SAS EM library SAMPSIO.).
c. Report the outcomes
2) Exploring the example ABSTRACT in
GSTM. Answer the following questions
a. What are the differences when not
using the request stop list SUGISTOP but the default STOPLST?
b. What are the differences if
unchecking “Stemmed words as root form”?
c. What are the differences between
using Hierarchical and Expectation maximization for the configuration of Text
Miner node?
----- + ------ + ------- + ------- +
------- + -------
Lecture 13, Date: 04/11/07,
Wednesday
Topic: Association Analysis
Reading:
4) RG Chapter 3 (pp78-84)
5) ADMT Ch 8.1
Slides: DM9
Agenda:
6) Quiz 4 (Clustering)
7) Homework 3 review – classification
with BUY dataset
8) Basic concepts of association
analysis
Review
questions:
1) What are differences between the
datasets used for association analysis and the ones for clustering or
classification?
2) Does the order of the items in an
itemset matter?
------ + ------ + ------- + -------
+ ------- + -------
Lecture 12, Date: 04/06/07, Friday
Topic: Association Analysis
Reading:
6) RG Chapter 3 (pp78-84)
7) ADMT Ch 8.1
Slides: DM9
Agenda:
9) Basic concepts of association
analysis
10) Homework 2 project – Classification
modeling using dataset DMAIL. How to optimize the results.
11) Exercise 3
(Clustering)
Review
questions:
3) RG p102, review question 2.
4) Use the clustering worksheet (http://zlin.ba.ttu.edu/6347/Clustering.xls
) to explore different outcomes of clustering. Modify the coordinates of the
instances to obtain different datasets. Check the outcomes. Selectively record
the k-mean clustering iterations for 2 different sets of instances including
the illustrative charts.
Homework assignment 4
(Due on April 13, Friday):
1) RG p103, Computational Questions: 10
(feel free to use the clustering worksheet http://zlin.ba.ttu.edu/6347/Clustering.xls
)
2) Use clustering approach to analyze
dataset S3358 (ISQS 3358 student survey data) in shared directory under
\Other_Data subdirectory. Report a few findings with selected screenshots of
the representative results. The following are a few questions that could
interest the instructor:
1. How many groups should the students
be divided? Why?
2. What are the characteristics of each
group?
3. Which factors are more important in
cluster the students
------ + ------ + ------- + -------
+ ------- + -------
Lecture 11, Date: 04/04/07,
Wednesday
Topic: Implementing Clustering
Reading:
1) Applying Data Mining Techniques
Using Enterprise Miner (ADMT), Chapter 7
2) Data Mining Using SAS Enterprise
Miner: A Case Study Approach (DMCS), Chapter 4
3) RG Chapter 10 (Section 10.4)
4) Hierarchical clustering, http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/hierarchical.html,
http://www.resample.com/xlminer/help/HClst/HClst_intro.htm
5)
Ward algorithm http://www.chemaxon.com/jchem/doc/user/Ward.html
Slides: DM8
Agenda:
1) Clustering analysis with SAS
Enterprise Miner
2) Self-organizing map – a neural
network method
Review
questions:
1) RG p102 Review question 5
2) RG p323 Review question 3
------ + ------ + ------- + -------
+ ------- + -------
Lecture 10, Date: 04/02/07, Monday
Topic: Introduction to Clustering
Reading:
1) Applying Data Mining Techniques
Using Enterprise Miner (ADMT), Chapter 7
2) RG Chapter 3 (Section 3.3)
3) Clustering: An Introduction, http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/index.html
4) K-mean clustering tutorial, http://people.revoledu.com/kardi/tutorial/kMean/index.html
Slides: DM8
Other: Clustering demon
Agenda:
1) Quiz 3 (Classification)
2) Principle of clustering
Review
questions:
1) What are main difference between
clustering and classification data mining?
2) Check your datasets that you found
for Homework 1. Do they fit clustering or classification method?
3) RG p102, Review questions 3.
------ + ------ + ------- + -------
+ ------- + -------
Lecture 9, Date: 03/30/07, Friday
Topic: Neural Network for classification
Reading:
1) Applying Data Mining Techniques
Using Enterprise Miner (ADMT), Chapter 5
2) RG Chapter 8
Slides: DM7
Agenda:
1) Principle of neural network for data
mining
2) Home equity loan decision – Neural
network
Review qu