Description: Description: Description: MCj04043250000[1]

Syllabus: ISQS 6347, Spring 2012

Data & Text Mining

 

Home | Schedule | Projects | SAS Online Demos | Notes | Homework 

 

 

Schedule: MW 2:00-3:20p, BA277

Instructor: Zhangxi Lin, (806) 834-1926, BA E311; Office hours: MWTr 9:30-11:30a, or by appointment.

Email: zhangxi.lin@ttu.edu

Social networking: Google Talk ID: zhangxi.lin, Twitter: zhangxi51

Homework submission: isqs6347@gmail.com

Shengxin Lin’s email address: shengxin.lin AT ttu.edu

 

Teaching Assistant: TBD

 

Course Description:

This course covers the basics of data mining and text mining, with applications in business intelligence, customer relationship management, fraud and terrorism detection, improvement of resource utilization, click-stream web mining, and credit scoring for loan applications.  The software SAS Enterprise Miner will be used extensively to illustrate use of decision trees, classification algorithms, neural nets, clustering, and other data and text mining techniques.

Participants in this course are eligible to receive a data mining certificate from SAS Institute and Texas Tech University.

Learning objectives:

  • Understanding the general principles of data mining
  • Developing the skills of data mining modeling and data analysis with SAS Enterprise Miner to solve data mining problems, which include:
    1. Classification modeling
    2. Model performance evaluation
    3. Clustering
    4. Association analysis and link analysis
    5. Text mining
    6. Web mining

Prerequisites: A basic statistics course, such as ISQS 5345 “Statistical Concepts for Business & Management” or ISQS 5347 “Advanced Statistical Methods” (B or better), or equivalent; Programming, SAS, and/or Database are helpful but not required.

 

Assessment of Learning Outcomes:

  • Knowledge of the general principles of data mining will be assessed with exams and homework assignments.
  • The ability to apply SAS Enterprise Miner to solve data mining problems will be assessed by guided exercises, homework assignments, and one term project.

 

Textbooks:

SAS Course Notes (electronic versions):

·            Applied Analytics Using SAS® Enterprise Miner™ 6.1, SAS Course Notes, 664p, 2007 (AAEM)

·            Mining Textual Data Using SAS® Text Miner for SAS®9, 328p (DMTM)

·            Effective Web Mining: Attracting and Keeping Valued Cyber Consumers, 632p, SAS Course Notes, 2001 (CCWEB, for EM 4.3)

Optional:

·         Introduction to Data Mining,  Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley, 2005, ISBN: 0321321367 (Website: http://www-users.cs.umn.edu/~kumar/dmbook/index.php )

References:

SAS Materials

·         Getting Start with SAS® 9.1 Text Miner, 60p (Free downloadable from SAS’s website)  

·         Getting Started with SAS Enterprise Miner 6.1, 76p PDF (1.88MB)

·         Getting Started with SAS Enterprise Miner 5.3, 184p (Free downloadable from SAS’s website)

·         Data Mining - A Case Study Approach, 135p, SAS Institute, 2006

·         Applying Data Mining Techniques Using Enterprise Miner, 308p (ADMT)

·         Introduction to data mining – using SAS Enterprise Miner, Patricia B. Cerrito, SAS Publishing, ISBN: 978-1-59047-829-5 (also see http://support.sas.com/pubs for more )

Textbooks:

·         Principles of Data Mining, David J. Hand, Heikki Mannila and Padhraic Smyth, The MIT Press, August 2001, ISBN 0-262-08290-X, 425 pp.

·         Data Mining: Concepts and Techniques, Jiawei Han, Micheline Kamber, Morgan Kaufmann, 2000, ISBN: 1558604898

·         Predictive Modeling with SAS Enterprise Miner – Practical Solutions for Business Applications, Kattamuri S. Sarma, SAS Institute, 2007

·         Chapter 4 & 5, Business Intelligence: A Managerial Approach, Second Edition, Pearson Prentice Hall, 2011, Efraim Turban, Ramesh Shard, Jay E. Aronson, David King

Print: ISBN-10 0-13-610066-X, ISBN-13 978-0-13-610066-9

eText: ISBN-10 0-13-610067-8, ISBN-13 978-0-13-610067-6

·         Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner, Galit Shmueli, Nitin R. Patel, Peter C. Bruce, ISBN: 978-0-470-08485-4, Hardcover, 279 pages, December 2006

·         Data Mining – A Tutorial Based Primer, Richard Roiger, Michael Geatz, 3rd edition. Addison Wesley, 2003, ISBN 0201741288

 

Deliverable and Grading Policy:

·         One final exam, 100 points

·         7 Quizzes, one of which will be dropped whichever has a lowest score, 60 points (no make up test)

·         Guided Exercises, 60 points. These exercises will be initially guided in the classroom and completed at home

·         E-learning Assignments 20 points

·         Term project, 80 points

·         Attendance, 20 points

The above is 340 points in total.

Letter grades are based on the percentage points earned out of the total 360 points:

·         A – 90% or higher

·         B – 80-89.9%

·         C – 70 – 79.9%

·         D – 60 – 69.9%

·         F < 60%

Attendance:

It is highly suggested that students attend all class meetings, particularly because of tight course schedule. The attendance is counted as 10 points and the roll check will be taken randomly. Missing one or two classes will lose 5 points each. Missing more than two classes will result in no credit from the attendance. If a student has to skip a class meeting, he/she needs to inform the instructor in advance. If the absence was caused by an unexpected situation, the evidence must be presented to the instructor for the credit of the attendance points.

Projects/Exercises:

The term project must be fulfilled with no more than four students in a group. PhD students must pick up a research topic with no more than two co-authors in a project team.

There are types of projects:

1)     The project topic based on 2011 SAS data mining shootout dataset

2)     The project using the datasets provided by the instructor

3)     Student-selected project topics. Extra credit could be applicable if there will be extra data collecting, cleansing, and preprocessing work.

Exercise/Project assignments must be completed in designated date. Late submission will result in a lower grade.

Submissions of homework are optional. Students are encouraged to complete all homework assignments as reviews of course contents, which helps improve the performance in the exams.

 

Resources:

·         SAS Data Mining certificate

·         Online SAS 9.1.3 references

·         SAS Text Miner references

·         SAS Enterprise Miner online materials

·         Selected online resources:

·         StatLib: http://lib.stat.cmu.edu/

·         MLnet: http://www.mlnet.org/

·         KDNuggets: http://www.kdnuggets.com/

·         Weka: http://www.cs.waikato.ac.nz/ml/weka/

·         Open source data mining projects: http://www.kdkeys.net/forums/72/ShowForum.aspx

·         Open source data mining tools: http://dmoz.org/Computers/Software/Databases/Data_Mining/Public_Domain_Software/

·         http://www.the-data-mine.com/bin/view/Misc/DataSource

·         Previous data mining courses

·         Dr. Peter Westfall’s data mining course (Fall 2004)

·         Data Mining course (Spring 2008)

·         Data Mining course (Spring 2009)

 

Job Search:

       www.Dice.com

       www.Monster.com

       www.About.com

       www.Beyond.com

       www.Icrunchdata.com

       www.AnalyticRecruiting.com

       www.Datashaping.com  

       www.Simplyhired.com

       www.Statscareers.com

       www.Jobs.phds.org

       www.Vault.com

       www.Quantster.com

       www.Computerjobs.com

       www.Sas-jobs.com

 

University Policies:

Requirements:  Please contact me if you have any special requirements, or if I need to make special accommodations for you during the semester.  I encourage you to visit with me about your progress in the course at any time.

Integrity.  Academic dishonesty will not be tolerated.  All students are required to adhere to the Texas Tech University Policy on Academic Honesty

Civility in the Classroom.  “Students are expected to assist in maintaining a classroom environment which is conducive to learning.  In order to assure that all students have an opportunity to gain from time spent in class, unless otherwise approved by the instructor, students are prohibited from using cellular phones or beepers, eating or drinking in class, making offensive remarks, reading newspapers, sleeping or engaging in any other form of distraction.  Inappropriate behavior in the classroom shall result in, minimally, a request to leave class.” 

ADA Requirements.  Classroom accommodations will be made for students with disabilities, if requested.

Religious Holidays.  A student who intends to observe a religious holy day should make that intention known to the instructor prior to an absence. A student who is absent from classes for the observance of a religious holy day shall be allowed to take an examination or complete an assignment scheduled for that day within a reasonable time after the absence.

 

Note: For updating the VPN access to TTU’s campus network, see: http://www.depts.ttu.edu/ithelpcentral/solutions/vpn.