Syllabus: ISQS 6347

Data & Text Mining

 

Home | Schedule | Sign-up/Update | Students | Records | Lecture notes

Group Sign-up | View Groups | Projects

 

Schedule: TTh 12:30-1:50p, BA 363 (Lab) or BA 056 (Sometimes for lectures)

Instructor: Zhangxi Lin (Zhangxi.lin@ttu.edu), (806) 742-1926, BA 708; Office hours: TTh 2-4p, or by appointment.

Course Description:

This course covers the basics of data mining and text mining, with applications in business intelligence, customer relationship management, fraud and terrorism detection, improvement of resource utilization, clickstream web mining, and credit scoring for loan applications.  The software SAS Enterprise Miner will be used extensively to illustrate use of decision trees, classification algorithms, neural nets, clustering, and other data and text mining techniques.

Participants in this course are eligible to receive a data mining certificate from SAS Institute and Texas Tech University.

Learning objectives:

  • Understanding the general principles of data mining
  • Being able to apply the commonly used functions of SAS Enterprise Miner to solve data mining problems
  • Developing the skills of data mining modeling and data analysis with SAS Enterprise Miner

Prerequisites: A basic statistics course, such as ISQS 5345 “Statistical Concepts for Business & Management” or ISQS 5347 “Advanced Statistical Methods” (B or better), or equivalent; Programming, SAS, and/or Database are helpful but not required.

Textbook:

Required: Data Mining, Roiger and Gertz. 3rd edition. Addison Wesley, ISBN 0201741288 (This book comes with a Microsoft Excel based data miner called iData Analyzer).

Optional: Introduction to Data Mining,  Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley, 2005, ISBN: 0321321367

(Website: http://www-users.cs.umn.edu/~kumar/dmbook/index.php )

Teaching style: Case-based hands-on learning process

Deliverable and Grading Policy:

  • Six quizzes out of seven (150 points), and
  • Homework (30 points)
  • Two projects (120 points)

The total is 300 points.

Projects:

The first project must be done individually based on the data set you find. The dataset directly collected from real world is highly encouraged. The second project will be conducted in the basis of group. Each project group consists of 2-3 students. The presentation is also based on groups.

References:

  • Selected online resources:

·         StatLib: http://lib.stat.cmu.edu/

·         MLnet: http://www.mlnet.org/

·         KDNuggets: http://www.kdnuggets.com/

·         Weka: http://www.cs.waikato.ac.nz/ml/weka/

·         Open source data mining projects: http://www.kdkeys.net/forums/72/ShowForum.aspx

·         Open source data mining tools: http://dmoz.org/Computers/Software/Databases/Data_Mining/Public_Domain_Software/

 


Outline of the Course

I. DATA MINING FUNDAMENTALS

1. Data Mining Concepts

Definition

Data vs. knowledge

Data preprocessing

Supervised vs. unsupervised data mining

2. Data Mining Tools

iData Analyzer.

SAS Enterprises Miner

3. Data Mining Techniques

Decision Trees.

Generating Association Rules.

The K-Means Algorithm.

Clustering

4. Knowledge Discovery in Databases

II. ADVANCED DATA MINING TECHNIQUES

5. Neural Networks.

Feed-Forward Neural Networks.

Neural Network Training: A Conceptual View.

Neural Network Explanation.

6. Statistical Techniques.

Linear Regression Analysis.

Logistic Regression.

Bayes Classifier.

Clustering Algorithms.

III. TEXT MINING

7. Preliminaries

Processing Textual Data

Converting Unstructured Text to Structured Data

Transformations

Applications

8. Exploratory Analysis of Document Collections

Simple Statistical Analysis

Exploration Using Text Miner

Clustering

9. Predictive Modeling

Model Inputs Derived from Text Mining

Predictive Modeling with Text Mining Inputs

10. Web mining