Syllabus: ISQS 6347, Spring 2007

Data & Text Mining

 

Home | Schedule | Sign-up/Update | Students | Records | Lecture notes

Group Sign-up | View Groups | Projects

 

 

Schedule: MWF 11:00-12:50p, BA 363 (Lab) or BA 268 (Sometimes for lectures)

Instructor: Zhangxi Lin, (806) 742-1926, BA 708; Office hours: M, WTh 1:30-3:30p, or by appointment.

Email: zhangxi.lin@ttu.edu, MSN: Zhangxi@sbcglobal.net, zhangxi.lin@hotmail.com, Google talk ID: zhangxi.lin

 

Course Description:

This course covers the basics of data mining and text mining, with applications in business intelligence, customer relationship management, fraud and terrorism detection, improvement of resource utilization, clickstream web mining, and credit scoring for loan applications.  The software SAS Enterprise Miner will be used extensively to illustrate use of decision trees, classification algorithms, neural nets, clustering, and other data and text mining techniques.

Participants in this course are eligible to receive a data mining certificate from SAS Institute and Texas Tech University.

Learning objectives:

  • Understanding the general principles of data mining
  • Being able to apply the commonly used functions of SAS Enterprise Miner to solve data mining problems
  • Developing the skills of data mining modeling and data analysis with SAS Enterprise Miner
  • Mastering general data preparation skills and tools

Prerequisites: A basic statistics course, such as ISQS 5345 “Statistical Concepts for Business & Management” or ISQS 5347 “Advanced Statistical Methods” (B or better), or equivalent; Programming, SAS, and/or Database are helpful but not required.

Textbook:

Required: Data Mining, Roiger and Gertz. 3rd edition. Addison Wesley, ISBN 0201741288 (This book comes with a Microsoft Excel based data miner called iData Analyzer).

Optional:

1) Introduction to Data Mining,  Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley, 2005, ISBN: 0321321367 (Website: http://www-users.cs.umn.edu/~kumar/dmbook/index.php )

2) Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner, Galit Shmueli, Nitin R. Patel, Peter C. Bruce, ISBN: 978-0-470-08485-4, Hardcover, 279 pages, December 2006

3) Introduction to Data Mining - Using SAS Enterprise Miner, Patricia B. Cerrito, SAS Publishing, ISBN: 978-1-59047-829-5 (http://support.sas.com/pubs)

Teaching style: Case-based hands-on learning process

Deliverable and Grading Policy:

  • Six quizzes out of seven (30 points)
  • In-class exercises (120 points)
  • Homework (30 points)
  • Midterm project (40 points)
  • Term project (60 points)

The total is 280 points.

Projects:

All projects will be conducted in the basis of group. Each project group consists of 2-3 students.

References:

  • Dr. Peter Westfall’s data mining class (Fall 2004)
  • Principles of Data Mining, David J. Hand, Heikki Mannila and Padhraic Smyth, The MIT Press, August 2001, ISBN 0-262-08290-X, 425 pp.
  • Data Mining: Concepts and Techniques, Jiawei Han, Micheline Kamber, Morgan Kaufmann, 2000, ISBN: 1558604898
  • Data Mining Using SAS Enterprise Miner: A Case Study Approach, by SAS Institute (Other Contributor)
  • Online SAS references
  • SAS documentation (authorized persons only – please do not distribute)
  • Selected online resources:

·         StatLib: http://lib.stat.cmu.edu/

·         MLnet: http://www.mlnet.org/

·         KDNuggets: http://www.kdnuggets.com/

·         Weka: http://www.cs.waikato.ac.nz/ml/weka/

·         Open source data mining projects: http://www.kdkeys.net/forums/72/ShowForum.aspx

·         Open source data mining tools: http://dmoz.org/Computers/Software/Databases/Data_Mining/Public_Domain_Software/

 


Outline of the Course

I. DATA MINING FUNDAMENTALS

1. Data Mining Concepts

2. SAS Enterprises Miner

II. Data Mining Techniques

3. Decision Trees.

4. Logistic Regression

5. Neural Network

6. Association Analysis.

7. Clustering

III. TEXT MINING

8. Preliminaries

9. Exploratory Analysis of Document Collections

10. Predictive Modeling

IV. WEB MINING

11. Introduction

12. Data collection for web mining

13. Knowing customers

14. Attracting web visitors

15. Evaluating web visitors

16. Keeping customers