Syllabus: ISQS 6347, Spring 2015
Data & Text Mining
Schedule: TR 11:00a-12:20p, BA001
Instructor: Zhangxi Lin, (806) 834-1926, Email: zhangxi.lin at ttu
Social networking: LinkedIn ID: my ttu email, Facebook – Zhangxi.lin at gmail
Office hours: TR 5-6p, BA 003 (subject to change), or by appointment at BA E311.
· W 2-3:30p, BA E337: Rashid (muhammadrashid.patel at ttu), Anagha (anagha.zadbuke at ttu)
· T 8-9:30a, BA 273: Mink (pinyarat.sirisomboonsuk at ttu)
Homework submission: firstname.lastname@example.org
Important: Rawls College 2015 Symposium on Big Data, April 10, 2015, McCoy Atrium, The Rawls College, TTU
This course covers the basics of data mining, text mining, and sentiment analysis, with applications in business intelligence, customer relationship management, fraud and terrorism detection, improvement of resource utilization, click-stream web mining, and credit scoring for loan applications. The software SAS Enterprise Miner will be used extensively to develop data mining models of decision trees, classification algorithms, neural nets, clustering, and text mining.
Participants in this course are eligible to receive a data mining certificate from SAS Institute and Texas Tech University.
Prerequisites: A basic statistics course, such as ISQS 5345 “Statistical Concepts for Business & Management” or ISQS 5347 “Advanced Statistical Methods” (B or better), or equivalent; Programming, SAS, and/or Database are helpful but not required.
Assessment of Learning Outcomes:
SAS Course Notes (electronic versions):
· Applied Analytics Using SAS® Enterprise Miner™ 6.1, SAS Course Notes, 664p, 2007 (AAEM)
· Data Mining – A Tutorial Based Primer, Richard Roiger, Michael Geatz, 3rd edition. Addison Wesley, 2003, ISBN 0201741288
· Sentiment Analysis and Opinion Mining, Bing Liu, 2012, Morgan & Claypool Publishers
SAS Course Materials
· Mining Textual Data Using SAS® Text Miner for SAS®9, 328p (DMTM)
· Effective Web Mining: Attracting and Keeping Valued Cyber Consumers, 632p, SAS Course Notes, 2001 (CCWEB, for EM 4.3)
· Getting Start with SAS® 9.1 Text Miner, 60p (Free downloadable from SAS’s website)
· Getting Started with SAS Enterprise Miner 6.1, 76p PDF (1.88MB)
· Data Mining - A Case Study Approach, 135p, SAS Institute, 2006
· Applying Data Mining Techniques Using Enterprise Miner, 308p (ADMT)
· Introduction to data mining – using SAS Enterprise Miner, Patricia B. Cerrito, SAS Publishing, ISBN: 978-1-59047-829-5 (also see http://support.sas.com/pubs for more )
· Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel® with XLMiner®, 2nd Edition, Galit Shmueli, Nitin R. Patel, Peter C. Bruce, October 2010, ©2010 (Website: http://www.wiley.com/WileyCDA/WileyTitle/productCd-EHEP002378.html)
· Data Science for Business: What you need to know about data mining and data-analytic thinking, By Foster Provost and Tom Fawcett, 414p, O'Reilly Media (July 27, 2013), ISBN-13: 978-1449361327, ISBN-10: 1449361323, Edition: 1st
· Introduction to Business Data Mining, David Olson, Yong Shi, McGraw-Hill Irwin, 2007, ISBN-13: 987-0-07-295971-0, ISBN-10: 0-07-295971-1, 288p, check here
· Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison Wesley, 2005, ISBN: 0321321367 (Website: http://www-users.cs.umn.edu/~kumar/dmbook/index.php )
· Principles of Data Mining, David J. Hand, Heikki Mannila and Padhraic Smyth, The MIT Press, August 2001, ISBN 0-262-08290-X, 425 pp.
· Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management Paperback, by Gordon S. Linoff , Michael J. A. Berry, Publisher: Wiley; 3 edition (April 12, 2011), ISBN-10: 0470650931, ISBN-13: 978-0470650936
· Predictive Modeling with SAS Enterprise Miner – Practical Solutions for Business Applications, Kattamuri S. Sarma, SAS Institute, 2007
· Chapter 4 & 5, Business Intelligence: A Managerial Approach, Second Edition, Pearson Prentice Hall, 2011, Efraim Turban, Ramesh Shard, Jay E. Aronson, David King
Print: ISBN-10 0-13-610066-X, ISBN-13 978-0-13-610066-9
eText: ISBN-10 0-13-610067-8, ISBN-13 978-0-13-610067-6
Deliverable and Grading Policy:
· Two midterm exams and one final exam. One of them will be dropped whichever has the lowest score, 140 points
· Four quizzes without credit but counted as attendances, which are mainly for checking the learning status of the class
· Exercises, 60 points (100% - A+, 90%+ - A, 80%+ B, and 70%+ C), including
· Guided Exercises, 80%, These exercises will be initially guided in the classroom and completed at home
· E-learning Assignments, 20%
· Term project, 60 points (100% - A+, 90%+ - A, 80%+ B, and 70%+ C)
· Attendance, 10 points
The above is 270 points in total.
Overall letter grades are based on the percentage points earned out of the total points:
· A – 90% or higher & overall a B or upper in exams & good attendance
· B – 80-89.9% & overall a C or upper in exams
· C – 70 – 79.9% & & overall a C- or upper in exams
· D – 60 – 69.9%
· F < 60%
It is highly suggested that students attend all class meetings, particularly because of tight course schedule. The attendance is counted as 10 points and the roll check will be taken randomly. Missing one class is fine but will lose 5 points for missing each extra meeting. Missing more than two classes will result in no credit from the attendance. If a student has to skip a class meeting, he/she needs to inform the instructor in advance. If the absence was caused by an unexpected situation, the evidence must be presented to the instructor for the credit of the attendance points.
The term project must be fulfilled with no more than Four students in a group. PhD students (or those already had an advanced degree) may pick up a research topic with no more than two co-authors in a project team.
There are types of projects:
1) The project topic based on 2013/2014/2015 SAS data mining shootout dataset
2) The project using the datasets provided by the instructor
3) Student-selected project topics. Extra credit could be applicable if there will be extra data collecting, cleansing, and preprocessing work.
Exercise/Project assignments must be completed in designated date. Late submission will result in a lower grade.
Submissions of homework are optional. Students are encouraged to complete all homework assignments as reviews of course contents, which helps improve the performance in the exams.
Requirements: Please contact me if you have any special requirements, or if I need to make special accommodations for you during the semester. I encourage you to visit with me about your progress in the course at any time.
Integrity. Academic dishonesty will not be tolerated. All students are required to adhere to the Texas Tech University Policy on Academic Honesty.
Civility in the Classroom. “Students are expected to assist in maintaining a classroom environment which is conducive to learning. In order to assure that all students have an opportunity to gain from time spent in class, unless otherwise approved by the instructor, students are prohibited from using cellular phones or beepers, eating or drinking in class, making offensive remarks, reading newspapers, sleeping or engaging in any other form of distraction. Inappropriate behavior in the classroom shall result in, minimally, a request to leave class.”
ADA Requirements. Classroom accommodations will be made for students with disabilities, if requested.
Religious Holidays. A student who intends to observe a religious holy day should make that intention known to the instructor prior to an absence. A student who is absent from classes for the observance of a religious holy day shall be allowed to take an examination or complete an assignment scheduled for that day within a reasonable time after the absence.
· SAS related websites
· Data mining tutorials:
· Standford University’s courseware: http://www.mmds.org/
· Statistical Data Mining Tutorials http://www.autonlab.org/tutorials/list.html
· TutorialsPoint: http://www.tutorialspoint.com/data_mining/dm_useful_resources.htm
· Top 10 data mining video tutorials: http://mydatamine.com/top-10-data-mining-video-sites/
· General data mining related resources:
· Open Source Software RapidMiner: http://rapidminer.com/
· StatLib: http://lib.stat.cmu.edu/
· KDNuggets: http://www.kdnuggets.com/
· Open source data mining projects: http://www.kdkeys.net/forums/72/ShowForum.aspx
· Open source data mining tools: http://dmoz.org/Computers/Software/Databases/Data_Mining/Public_Domain_Software/
· Tools, methods, solutions, and hints
· How to read SPSS or Stata data files into SAS using Proc Import? http://www.ats.ucla.edu/stat/sas/faq/stata_spss.htm
Note: For updating the VPN access to TTU’s campus network, see: http://www.depts.ttu.edu/ithelpcentral/solutions/vpn.