ISQS 3358 Business Intelligence Project
Instructor:
In this
project up to 3-4 students form a team to fulfill a data mart development
project. The project includes deliverables in five different stages:
Stage 1 (10%): Project proposal. The selection of the project mainly relies on the
availability of the dataset and your understanding of the business story about
the data. The deliverable is a project proposal in 2-4 pages, containing (1)
Project topic, (2) The availability of the dataset, (3) analytic themes, (4) A
conceptual dimensional model, and (5) A list of references. Due
March 9, Monday.
FAQ:
What does
“The availability of the data set” mean? You need to confirm the data set used for the
project is available. Explain as much as possible (1) the business background
of the data, (2) the source of the data, and (3) the quality of the data
What is
“analytic themes”? They are business needs or requirements for the project. They
can be obtained by interviewing people in the business or a thorough
investigation. Analytic themes reflect the major challenges and opportunities
an organization is facing. Based on the analytic themes you will be able to
determine the goal and objectives of data mart development.
What is “A
conceptual dimensional model”? This is a kind of logic design of the data mart
with main table names but without much details in the tables.The draft of the dimensional model in a diagram
format to be implemented later.
Stage 2 (20%): Data mart development. Based on the data you collected in
the first stage, the tasks in this stage: (1) dimensional model design (at
least four dimensions), and (2) data mart creation but not necessarily being
populated. The description of the data mart will be 3-5 pages including figures
or screenshots of the designed model. Due March 25, Wednesday.
Contents
of the deliverables:
1.
A dimensional model diagram, either PowerPoint slide or the
screenshot of the data source view. This must be the refined version from the
“conceptual dimensional model” in Stage 1.
2.
Screenshots of design pages for each table from SQL Server
Management Studio.
3.
A brief description of the considerations in the dimensional
model design.
a.
How time dimension is defined? In what granularity?
b.
The hierarchy within a dimension, if any. (For example, State,
county, city, and street are in a hierarchical structure).
c.
The aggregate dimensions, if any. (For example, in MaxMinManufacturingDM, DimMachineType
is an aggregate from DimMachine).
d.
Indicate the primary key and foreign keys in each table.
e.
Others.
Stage 3 (20%): Data mart populating with ETL system
implementation.
Refer to Exercise 4-6 to complete this assignment. The
deliverables: (1) The screenshot of ETL system logic design, (2) A brief
description of ETL system implementation, (3) Notes about data quality and
other issues during data preprocessing. It
will be 2-3 pages including figures and/or screenshots. Due April 6, Monday.
Stage 4
(25%): Analysis report: The analyses of the data in the data mart. The report
contains:
(1) 4-6 findings from the data analysis.
(2) 2-4 diagram/charts/tables generated with
SAS Enterprise Guide.
(3) An optional choice of the data analysis is
to do OLAP with the cube created from your data mart. Try to wok out the OLAP
application outcomes using SAS Enterprise Guide or SSAS. You may “drill down”
or “drill up” the OLAP cube. Both SAS EG and SSAS Browser allow you to expand
the dimension and to explore the data in tables or graphs.
The
report’s length is 2-4 pages plus
figures and tables. Due April
20, Monday.
Stage 5 (25%): The compilation of the previous deliverables with
modifications and enhancements. Hardcopy and electronic version are both needed, due May 1, Friday.
(1)
The report will contain the following parts
a.
Cover page. Including the team’s name, team members, title of
the report, course number and name, and the date the report is submitted.
b.
Table of contents
c.
One-page executive summary
d.
Introduction section. Mainly based on the contents of
Deliverable #1 but with extensions:
-
The background of the project, such as what are the businesses
supported by this project.
-
Analytic themes, i.e. business requirement to this project.
Relevant business processes that generate data and use the data are expected
-
The availability of the dataset, including the quality of the
data. You have process the data for ETL. So you should have pretty much to say
about this.
e.
Data mart development section, mainly deliverable #2 with
necessary modifications
f.
Data mart populating section, mainly deliverable #3 with necessary
modifications
g.
Analysis report section, mainly deliverable #4 with necessary
modifications (highly suggested to do so)
h.
Summary section of the project, 1-2 page.
It may contain: what you have learned, the top three comments on this project,
the progress of the project regarding the time schedule.
i.
Reference list. If a reference is from the Internet, you need to
list the title of the page in addition to the url address.
(2)
A hardcopy of the report will be submitted for grading, plus an
electronic copy emailed to zhangxi.lin@hotmail.com.
Project topic selection guidelines:
You may pick one of the following
topics, or whatever else you like with the available data:
1) MSBA Program in the ISQS Area. What are problems? How to increase
the competence of the program? What could be done for promoting the mater
programs in ISQS?
2) Online targeted marketing. Check http://www.xplusone.com, http://www.doubleclick.com, or eBay’s AdSense.
3) Medical services. If you are working for UMC, what
should you do in the competition with the Covenant System?
4) Agricultural business intelligence -
Cotton businesses
5) Online real estate information
services. Check http://www.commrex.com, http://www.loopnet.com. If you are the owner
of IMW what should you do to improve the competence?
6) Car retailing business. Based on the Toyota TLS case, what
are new challenges? What can be improved? How?
7) Online product pricing. How can longs.com automate the
priced differentiated with locations?
The examples of data set:
1) Customer relationship management data.
2) Employee data
3) Product data
4) Advertisement data
5) Transportation means data
If the
original data is not clean enough, you need to preprocess the data. You may
generate new dimensional tables from the main dataset as you have done in the
exercises.
Merits of the project outcomes
1) Carefully developed project proposal
demonstrating the understanding of the business requirements, attractive
analytics themes, and clearly defined project goal and objectives
2) Comprehensive data mart design, such
as multiple fact tables, with supporting analytic themes
3) Applications of advanced ETL model
or techniques, such as slowly changing dimensions, the use of containers, etc.
4) Advanced OLAP cube design, and/or optional
MDX scripting by self-taught
5) Rich data analysis outcomes
6) Well-presented final report
7) Demonstrating the creative ideas and
skillful data warehousing ability