ISQS 6347 Data & Text Mining
Projects (spring 2007)
Project 1 (check the Example)
This project
will allow students to practice data mining methods and SAS EM skills learned
from the class. The following are the suggested steps to fulfill the project:
Step 1:
Identify a project topic and determine the objectives of the data mining
project. Find an available dataset for the project. You can use one of the
datasets you found for homework 1.
Step 2:
Study and understand the dataset by exploring it. Pay attention at the quality
of the data (any missing value), the meaningful attributes (variables),
attributes (variables) distributions, and the types of variable values.
Step 3:
Perform necessary data cleansing and conversion tasks.
Step 4:
Choose Decision Tree, Clustering, OR
Association Analysis and use SAS Enterprise Miner to develop a data mining
model upon the dataset.
Step 5:
Fine tune the model and try to explain the outcomes of the data mining as much
as possible with regard to the project objectives
Step 6:
Conceive a project report based on the data mining analysis outcomes
The project
report is the final deliverable for the project. It includes the following
sections:
Issues
during project accomplishment:
The project
report is due on April 13, 2007.
------------------------------------------------
Project 2 Online Movie
Recommendation System Design
LOS GATOS, Calif., October 2, 2006 –
Netflix, Inc. (Nasdaq: NFLX), the world's largest online movie rental service,
today announced the creation of the Netflix Prize, an award of one million
dollars to the first person who can achieve certain accuracy goals in
recommending movies based on personal preferences. The company also made
available to contestants 100 million anonymous movie ratings ranging from one
to five stars, the largest such data set ever released. (http://www.netflix.com/MediaCenter?id=5368)
The second
project is aimed at the above prize to design an online movie recommendation
system. To accomplish this project, you will need to learn a lot. Here are some
initial guidelines for you:
1) What is an online recommendation
system? How does it work?
2) What is the business model for an
online recommendation system?
3) What data mining techniques are needed
for an online recommendation system? And how?
4) How to design a web-base
recommendation system?
5) How data is dynamically collected
for the system?
Use the
following keywords to search for online recommendation system: targeted
advertising, recommender system, collaborative filtering, content-based,
conditional frequency, text mining, clustering, classification, segmentation,
keyword-based.
The report
will include the following contents:
1) The overview of online
recommendation system – 1 page
2) The requirement analysis of Netflix
online movie recommendation system – 1 page
3) Business model and business
processes – 1 pages
4) The implementation scheme of the
system and its logic structure – 1-2 pages with diagrams if necessary
5) Reference list in APA format (http://owl.english.purdue.edu/owl/resource/560/01/)
The above
as the Part one of the project must be done by every project group. Then the
project groups will proceed to choose one of the following tasks as the Part
two of the project depending on the preference of each group:
Choice 1: Complete
the above project by performing more tasks
1) Identify data sources if your team
is to tackle the problem and figure out the methods for data collections (you
don’t need to really do the data collection)
2) Identify critical technical issues
of the decision modeling and main algorithms for a recommendation system
3) Choose one of issues, which is
solvable, and work out the results. The issue could be the optimization of the
resource allocation, a recommendation algorithm, etc. Data set is preferably
used and analyzed.
4) Discuss the unresolved issues and
possible resolutions
5) Anything else relevant, such as the business
model, business process, and so forth.
Choice 2: Regular
data mining project
1) Project groups may choose another
data mining topic not necessarily related to recommender systems. Feel free to use
the same or different dataset as used in Project 1, but the focuses must be
differentiated from those in Project 1.
If the same dataset is used, you don’t need to explain too much of the
dataset if you have done this well in the project 1 report.
2) Using a new dataset is also a good
choice. Topics of Text mining are
encouraged.
3) In addition to basic requirements as
described for project 1 report, the following are more factors justifying an
“A” quality report:
·
Solve
the problem with a comprehensive data mining model beyond the issues addressed
in the class, demonstrating that the group have self-taught and did creative
work in using the tool.
·
Use
a real dataset from some business and the topic and the findings of the data
mining results have important implications to the specific business background.
·
Have
solved some special technical problem in using SAS Enterprise Miner. Need to
explain how you solve the problem.
Project 2
is due on May 7, 2007.
The above
is a primitive outline of the project instruction for the early notification.
More information will be added. However,
early birds will be always benefited by proactive learning.