ISQS 6347 Data & Text Mining Project (Spring 2013)

 

Home | Schedule | Projects | SAS Online Demos | Notes | Homework

 

Note: the instruction is subject to updates

 

Note:

1.    The early bird catches worm – start the project as early as possible.

2.    Students are welcome to sign up to SAS 2013 data mining shootout contest regardless whichever types of term project will be chosen. The term project team may consist of students who join in different SAS 2013 shootout project groups.

 

Type I: SAS Data Mining Shootout Project

 

This type of project requests students to be involved into 2013 SAS data mining shootout contest, using the dataset provided by SAS.

 

Steps:

1)    Study the instructions and dataset for 2010-2012 SAS data mining shootout to understand the criteria and workload (Available in the shared network drive). Complete a proposal in Stage 1.

2)    Sign up to 2013 SAS data mining shootout website when it is available.

3)    Complete the preliminary data mining tasks with the dataset provided by SAS at Stage 2.

4)    Complete a report for the term project at Stage 3

5)    Participate in SAS data mining shootout project during summer and finalize the report by early July.

 

Type II: Regular SAS Data Mining Project

 

There are several choices for the project topic:

1)    Use the dataset downloadable from Prosper.com (references: “mnsc.1110.1459.full.pdf”, “Puro DSS 201004.pdf” and “Puro DSS 201011.pdf” in \Term project\References in the shared directory)

2)    Use a financial dataset, such as stock prices, REIT series, etc. The dataset can be found in WRDS (http://wrds-web.wharton.upenn.edu/wrds/)

3)    Use a KDD cup dataset (check http://www.kdnuggets.com/datasets/kddcup.html)

4)    A text mining project is also fine. Check Amazon.com for product review comments.

5)    Use other dataset found by the project team, which must be approved by the instructor.

 

Type III: Research Paper in the data mining approach

 

This type of project is good to PhD students with normally no more than two students co-authoring a paper, but dataset is sharable by several paper projects as long as these papers have different research focuses. Master students may be involved into this kind of project led by a PhD student. If a PhD student wants to do a regular data mining project, please contact the instructor for the permission.

 

 

The following are the multi-stage process to fulfill the project:

 

Stage 1 – Project proposal (20%, due Mar 26):

 

It is very important that each project team arrange a meeting with the instructor during Feb 23-Mar 7 to discuss the project topic and the work plan.

 

Identify a specific data mining topic and determine the objectives of the project. A proposal of 2-4 pages is required, covering:

1)    motivations,

2)    objectives,

3)    business background description,

4)    dataset availability,

5)    data quality, and

6)    the description of data preprocessing tasks if any.

Submission: A hard copy of proposal is required.

--------------------

 

Stage 2 – Preliminary data mining (40%, due April 23):

 

Perform necessary data cleansing and conversion tasks. You can report the data quality status, such as missing values, coding conditions, format problems, etc., and the tasks you have done in data preprocessing, such as data format conversion, value recoding, etc. You can contact the instructor for help in data processing.  If the data is already clean enough, you can provide the information about the outcomes of data exploration. The reported information can include but not restricted to size of data, distributions of key variables, any interesting primitive findings, etc.

Choose some data mining techniques, such as Decision Tree, Regression, Clustering, Association Analysis, Link Analysis, OR Text mining, and use SAS Enterprise Miner to develop a data mining model upon the dataset.

1)    In the submitted report, you need to highlight the following information in the cover page:

a.    The group number and group members

b.    The type of this project

c.    The nature and source of the dataset

2)    In the preliminary data mining report the following contents are expected:

a.     The issue of data preparation, data exploration, and progressive data mining process. A systematic approach and appropriate methodology in a right logic is expected to show how you are to conduct the data mining.

b.    The primitive results, including data exploration outcomes, primitive findings and brief explanations. Any charts/tables must come with enough explanation.

You can merge some contents from the proposal with necessary modification. This will make your report at this stage coherent and nice-looking. Simply copy-pasting from the old to current will not help – A long report is not favorable.

 

Submission: Both hardcopy and e-copy are required. Email address: isqs6347@gmail.com.

 

-----------------------

 

Stage 3 (40%, due May 15 by 2:00p):

Furnish a final project report based on the data mining analysis outcomes, with necessary modification and refinements.

 

A.    The final project report is the final deliverable for the project. It includes the following parts:

 

  1. Cover page. The cover page must have a table at the bottom containing the following information, in addition to other information:

 

Project title

 

Class number / Semester

 

Student name

 

The type of this project

 

The nature and source of the dataset

 

Completion date

 

 

  1. Table of contents
  2. The project motivation and objectives. This section presents the background of the project, the importance of the project, the research questions, and project objectives. 1-2 pages
  3. The relevant research efforts or projects overview, 1 page
  4. Dataset description. It includes: where it comes from, the description of major attributes (variables), the quality of the dataset, and data preprocessing, 2-3 pages
  5. The data mining method you used and data mining process, such as the problem you encountered and how they were solved, 1 page
  6. Data mining outcomes. You need to attach the necessary charts from SAS Enterprise Miner, page number is not restricted
  7. Discussions, problems, and further work, 1-2 pages
  8. References, if any

 

You may reuse the materials you developed for your previous deliverables.

 

B.    In general, the report must demonstrate your knowledge in both data mining and the addressed business issue. It must look professional. The size of the report body should not exceed 30 pages (point-12 font). The detailed grading criteria:

 

1.    The writing quality of the report, such as the completeness of the contents with regard to the above requirements, and the coherence and correctness of the writing.

2.    The effort made in data collection and data preprocessing

3.    Data mining skills and strategies

4.    Comprehensiveness of data analysis results and explanations. Business background related and in-depth discussions are encouraged.

5.    Completeness and timeliness of the deliverables

 

C.   Issues you are to tackle during project accomplishment:

 

  1. The effect of dataset quality
  2. How to conduct data pre-processing
  3. How to select right variables for the model
  4. How to use other SAS EM nodes, such as Transform, Variable selection, Insight, Score, Assessment, Multiplot, Distribution Explorer, to improve the efficiency and effectiveness of data mining
  5. How to combine different data mining skills for the project, such as applying the stepwise regression for neural network variable selection.
  6. How to explain the data mining results

 

D.   The following situations may have negative effects on your report evaluation and should be avoided:

 

1.    Irrational research story and goal  

2.    Insufficient information

3.    Incoherent logic flow in the report

4.    Overdo the report with too much trivial information. That it, you need to know what is important and what is not important.

5.    Use too many charts generated from data mining process without enough explanation.

 

The final version of the project report must be submitted in both electronic copy (sent to the: isqs6347@gmail.com) and hardcopy.

 

Note: The effort made in data preparation will increase the credit of the report.

 

---------------------------------------------