|
|
Syllabus: ISQS 6347 Data & Text Mining Home | Schedule | Sign-up/Update | Students
| Records | Lecture notes Group
Sign-up | View
Groups | Projects |
Schedule: TTh 12:30-1:50p, BA 363 (Lab) or BA 056
(Sometimes for lectures)
Instructor:
Course Description:
This course covers the basics of data mining and text mining, with
applications in business intelligence, customer relationship management, fraud and terrorism detection, improvement of resource
utilization, clickstream web mining, and credit
scoring for loan applications. The
software SAS Enterprise Miner will be used extensively to illustrate use of
decision trees, classification algorithms, neural nets, clustering, and other
data and text mining techniques.
Participants in this course are eligible to receive a data
mining certificate from SAS Institute and Texas Tech University.
Learning objectives:
Prerequisites: A basic statistics course, such as ISQS 5345 “Statistical Concepts for
Business & Management” or ISQS 5347 “Advanced Statistical Methods” (B or
better), or equivalent; Programming, SAS, and/or Database are helpful but not
required.
Textbook:
Required: Data
Mining, Roiger and Gertz.
3rd edition. Addison Wesley, ISBN 0201741288 (This book comes with a Microsoft
Excel based data miner called iData Analyzer).
Optional: Introduction
to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin
Kumar, Addison Wesley, 2005, ISBN: 0321321367
(Website: http://www-users.cs.umn.edu/~kumar/dmbook/index.php
)
Teaching style: Case-based hands-on learning process
Deliverable and
Grading Policy:
The total is 300 points.
Projects:
The first project must be done individually based on the
data set you find. The dataset directly collected from real world is highly
encouraged. The second project will be conducted in the basis of group. Each
project group consists of 2-3 students. The presentation is also based on
groups.
References:
·
StatLib: http://lib.stat.cmu.edu/
·
MLnet: http://www.mlnet.org/
·
KDNuggets: http://www.kdnuggets.com/
·
Weka: http://www.cs.waikato.ac.nz/ml/weka/
·
Open
source data mining projects: http://www.kdkeys.net/forums/72/ShowForum.aspx
·
Open
source data mining tools: http://dmoz.org/Computers/Software/Databases/Data_Mining/Public_Domain_Software/
Outline of the Course
I. DATA MINING FUNDAMENTALS
1.
Data Mining Concepts
Definition
Data vs. knowledge
Data preprocessing
Supervised vs. unsupervised
data mining
2. Data
Mining Tools
iData
Analyzer.
SAS Enterprises Miner
3.
Data Mining Techniques
Decision Trees.
Generating Association Rules.
The K-Means Algorithm.
Clustering
4.
Knowledge Discovery in Databases
II. ADVANCED DATA MINING TECHNIQUES
5.
Neural Networks.
Feed-Forward Neural Networks.
Neural Network Training: A
Conceptual View.
Neural Network Explanation.
6.
Statistical Techniques.
Linear Regression Analysis.
Logistic Regression.
Bayes
Classifier.
Clustering Algorithms.
III. TEXT MINING
7.
Preliminaries
Processing Textual Data
Converting Unstructured Text
to Structured Data
Transformations
Applications
8.
Exploratory Analysis of Document Collections
Simple Statistical Analysis
Exploration Using Text Miner
Clustering
9.
Predictive Modeling
Model Inputs Derived from
Text Mining
Predictive Modeling with Text
Mining Inputs
10.
Web mining