ISQS 6347 Data & Text Mining Project (spring 2008)


(Check the Example)


This project will allow students to practice data mining methods and SAS EM skills learned from the class. The following are the suggested steps to fulfill the project:


Step 1: Identify a project topic and determine the objectives of the data mining project. Find an available dataset for the project. You can use one of the datasets you found for homework 1.

Step 2: Study and understand the dataset by exploring it. Pay attention at the quality of the data (any missing value), the meaningful attributes (variables), attributes (variables) distributions, and the types of variable values.

Step 3: Perform necessary data cleansing and conversion tasks.

Step 4: Choose Decision Tree, Clustering, OR Association Analysis and use SAS Enterprise Miner to develop a data mining model upon the dataset.

Step 5: Fine tune the model and try to explain the outcomes of the data mining as much as possible with regard to the project objectives

Step 6: Conceive a project report based on the data mining analysis outcomes


The project report is the final deliverable for the project. It includes the following sections:


  1. The project motivation and objectives. This section presents the background of the project, the importance of the project, the research questions, and project objectives. 1-2 pages
  2. The relevant research efforts or projects overview, 1 page
  3. Dataset description. It includes: where it comes from, the description of major attributes (variables), the quality of the dataset, and data preprocessing, 1-2 pages
  4. The data mining method you used and data mining process, such as the problem you encountered and how they were solved, 1 page
  5. Data mining outcomes. You need to attach the necessary charts from SAS Enterprise Miner, page number is not restricted
  6. Discussions, problems, and further work, 1-2 pages
  7. References, if any


Issues during project accomplishment:


  1. The effect of dataset quality
  2. How to conduct data pre-processing
  3. How to selection right variables for the model
  4. How to use other SAS EM nodes, such as Transform, Variable selection, Insight, Score, Assessment, Multiplot, Distribution Explorer, to improve the efficiency and effectiveness of data mining
  5. How to combine different data mining skills for the project, such as applying the stepwise regression for neural network variable selection.


The project report is due on May 7, 2008.