ISQS 6347 Data & Text Mining Projects

 

Project 2

 

Project 2 takes the same format as that for project 1 with the following differences:

 

  1. Project 2 will be done on the basis of per group
  2. About the project topic

1)     Project 2 may use the same dataset as used in Project 1, but the focuses must be differentiated from those in Project1.  If the same dataset is used, you don’t need to explain too much of the dataset if you have done this well in the project 1 report.

2)     Using a new dataset is also a good choice.  Topics of Text mining are encouraged.

  1. Group presentations will be scheduled based on the outcomes of project 2.
  2. In addition to basic requirements as described for project 1 report, the following are more factors justifying an “A” quality report:

1)     You solve the problem with a comprehensive data mining model beyond the issues addressed in the class, demonstrating that you have self-taught and did creative work in using the tool.

2)     You present very well the issues in the report based on the data analysis outcomes, which leads to some significant findings that could potentially be the contribution to the research literature.

3)     You use a real dataset from some business and the topic and the findings of the data mining results have important implications to the specific business background.

4)     You have solved some special technical problem in using SAS Enterprise Miner. You need to explain how you solve the problem.

  1. Project report due is May 6, Saturday, by 5p. The extension of submission will be granted upon the request but no later than 5p, May 8.

 

Group Presentations:

 

  1. Everybody will have a chance to present with no more than 8 minutes in average
  2. You need to focus on the specific issues in your project. You don’t need to cover everything in a limited time but tell whatever that will benefit your classmates. Specifically, you can emphasize on one of the four aspect indicated in the 4th point of project 2 instruction in the above.
  3. PowerPoint slides are required and will be emailed to the instructor after the presentation.

 

 

 

-------------------------------------------------------------------------------------------------------------

 

Project 1 (check the Example)

 

This project will allow students to practice data mining methods and SAS EM skills learned from the class. The project will be done on the individual basis. The following are the suggested steps to fulfill the project:

 

Step 1: Identify a project topic and determine the objectives of the data mining project. Find an available dataset for the project. You can use one of the datasets you found for homework 1.

Step 2: Study and understand the dataset by exploring it. Pay attention at the quality of the data (any missing value), the meaningful attributes (variables), attributes (variables) distributions, and the types of variable values.

Step 3: Choose Decision Tree, Clustering, OR Association Analysis and use SAS Enterprise Miner to develop a data mining model upon the dataset.

Step 4: Fine tune the model and try to explain the outcomes of the data mining as much as possible with regard to the project objectives

Step 5: Conceive a project report based on the data mining analysis outcomes

 

The project report is the final deliverable for the project. It includes the following sections:

 

  1. The project motivation and objectives. This section presents the background of the project, the importance of the project, the research questions, and project objectives. 1-2 pages
  2. The relevant research efforts or projects overview, 1 page
  3. Dataset description. It includes: where it comes from, the description of major attributes (variables), the quality of the dataset, and data preprocessing, 1-2 pages
  4. The data mining method you used and data mining process, such as the problem you encountered and how they were solved, 1 page
  5. Data mining outcomes. You need to attach the necessary charts from SAS Enterprise Miner, page number is not restricted
  6. Discussions, problems, and further work, 1-2 pages
  7. References, if any

 

Issues during project accomplishment:

 

  1. The effect of dataset quality
  2. How to conduct data pre-processing
  3. How to selection right variables for the model
  4. How to use other SAS EM nodes, such as Transform, Variable selection, Insight, Score, Assessment, Multiplot, Distribution Explorer, to improve the efficiency and effectiveness of data mining
  5. How to combine different data mining skills for the project, such as applying the stepwise regression for neural network variable selection.

 

The project report is due on March 30.