ISQS 6347 Data & Text Mining Projects (spring 2007)

 

Project 1 (check the Example)

 

This project will allow students to practice data mining methods and SAS EM skills learned from the class. The following are the suggested steps to fulfill the project:

 

Step 1: Identify a project topic and determine the objectives of the data mining project. Find an available dataset for the project. You can use one of the datasets you found for homework 1.

Step 2: Study and understand the dataset by exploring it. Pay attention at the quality of the data (any missing value), the meaningful attributes (variables), attributes (variables) distributions, and the types of variable values.

Step 3: Perform necessary data cleansing and conversion tasks.

Step 4: Choose Decision Tree, Clustering, OR Association Analysis and use SAS Enterprise Miner to develop a data mining model upon the dataset.

Step 5: Fine tune the model and try to explain the outcomes of the data mining as much as possible with regard to the project objectives

Step 6: Conceive a project report based on the data mining analysis outcomes

 

The project report is the final deliverable for the project. It includes the following sections:

 

  1. The project motivation and objectives. This section presents the background of the project, the importance of the project, the research questions, and project objectives. 1-2 pages
  2. The relevant research efforts or projects overview, 1 page
  3. Dataset description. It includes: where it comes from, the description of major attributes (variables), the quality of the dataset, and data preprocessing, 1-2 pages
  4. The data mining method you used and data mining process, such as the problem you encountered and how they were solved, 1 page
  5. Data mining outcomes. You need to attach the necessary charts from SAS Enterprise Miner, page number is not restricted
  6. Discussions, problems, and further work, 1-2 pages
  7. References, if any

 

Issues during project accomplishment:

 

  1. The effect of dataset quality
  2. How to conduct data pre-processing
  3. How to selection right variables for the model
  4. How to use other SAS EM nodes, such as Transform, Variable selection, Insight, Score, Assessment, Multiplot, Distribution Explorer, to improve the efficiency and effectiveness of data mining
  5. How to combine different data mining skills for the project, such as applying the stepwise regression for neural network variable selection.

 

The project report is due on April 13, 2007.

 

------------------------------------------------

 

Project 2 Online Movie Recommendation System Design

 

LOS GATOS, Calif., October 2, 2006 – Netflix, Inc. (Nasdaq: NFLX), the world's largest online movie rental service, today announced the creation of the Netflix Prize, an award of one million dollars to the first person who can achieve certain accuracy goals in recommending movies based on personal preferences. The company also made available to contestants 100 million anonymous movie ratings ranging from one to five stars, the largest such data set ever released. (http://www.netflix.com/MediaCenter?id=5368)

 

The second project is aimed at the above prize to design an online movie recommendation system. To accomplish this project, you will need to learn a lot. Here are some initial guidelines for you:

1)      What is an online recommendation system? How does it work?

2)      What is the business model for an online recommendation system?

3)      What data mining techniques are needed for an online recommendation system? And how?

4)      How to design a web-base recommendation system?

5)      How data is dynamically collected for the system?

 

Use the following keywords to search for online recommendation system: targeted advertising, recommender system, collaborative filtering, content-based, conditional frequency, text mining, clustering, classification, segmentation, keyword-based.

 

The report will include the following contents:

1)      The overview of online recommendation system – 1 page

2)      The requirement analysis of Netflix online movie recommendation system – 1 page

3)      Business model and business processes – 1 pages

4)      The implementation scheme of the system and its logic structure – 1-2 pages with diagrams if necessary

5)      Reference list in APA format (http://owl.english.purdue.edu/owl/resource/560/01/)

 

The above as the Part one of the project must be done by every project group. Then the project groups will proceed to choose one of the following tasks as the Part two of the project depending on the preference of each group:

 

Choice 1: Complete the above project by performing more tasks

1)      Identify data sources if your team is to tackle the problem and figure out the methods for data collections (you don’t need to really do the data collection)

2)      Identify critical technical issues of the decision modeling and main algorithms for a recommendation system

3)      Choose one of issues, which is solvable, and work out the results. The issue could be the optimization of the resource allocation, a recommendation algorithm, etc. Data set is preferably used and analyzed.

4)      Discuss the unresolved issues and possible resolutions

5)      Anything else relevant, such as the business model, business process, and so forth.

 

Choice 2: Regular data mining project

1)      Project groups may choose another data mining topic not necessarily related to recommender systems. Feel free to use the same or different dataset as used in Project 1, but the focuses must be differentiated from those in Project 1.  If the same dataset is used, you don’t need to explain too much of the dataset if you have done this well in the project 1 report.

2)      Using a new dataset is also a good choice.  Topics of Text mining are encouraged.

3)      In addition to basic requirements as described for project 1 report, the following are more factors justifying an “A” quality report:

·         Solve the problem with a comprehensive data mining model beyond the issues addressed in the class, demonstrating that the group have self-taught and did creative work in using the tool.

·         Use a real dataset from some business and the topic and the findings of the data mining results have important implications to the specific business background.

·         Have solved some special technical problem in using SAS Enterprise Miner. Need to explain how you solve the problem.

 

Project 2 is due on May 7, 2007.

 

The above is a primitive outline of the project instruction for the early notification. More information will be added.  However, early birds will be always benefited by proactive learning.