ISQS 3358 Business Intelligence
Project
Instructor:
Final Project Report
Due: April 21, Wednesday
In this project 3-4 students will form
a team to fulfill a data mart development project. The project includes deliverables
in five different stages:
Stage 1 (10%):
Project proposal. The selection of the
project mainly relies on the availability of the dataset and your understanding
of the business story about the data. The deliverable is a project proposal in
2-4 pages, containing (1) Project topic, (2) The availability of the dataset,
(3) analytic themes, (4) A conceptual dimensional model, and (5) A list of
references. Due March 5, Wednesday.
FAQ:
What does “The availability of the data set”
mean? You need to confirm the
data set used for the project is available
What is “analytic themes”? They are business needs or requirements for the project. You can
read the “Business needs” descriptions in Chapter 6, 7 and 8 of the textbook to
get the clue.
What is “A conceptual dimensional model”? The draft of the dimensional
model in a diagram format to be implemented later. See Chapter 6 for details.
Stage 2 (20%): Data
mart development. Refer to Chapter 6
of the textbook for the tasks in this stage: (1) dimensional model design (at
least four dimensions), and (2) data mart creation. The description of the data
mart will be 3-5 pages including figures or screenshots of the designed model. Due March 12, Wednesday.
Contents of the deliverables:
1.
A dimensional model diagram, either PowerPoint slide or the
screenshot of the data source view. This must be the refined version from the
“conceptual dimensional model” in Stage 1.
2.
Screenshots of design pages for each table from SQL Server
Management Studio.
3.
A brief description of the considerations in the dimensional
model design.
a.
How time dimension is defined? In what granularity?
b.
The hierarchy within a dimension, if any. (For example, State,
county, city, and street are in a hierarchical structure).
c.
The aggregate dimensions, if any. (For example, in MaxMinManufacturingDM, DimMachineType
is an aggregate from DimMachine).
d.
Indicate the primary key and foreign keys in each table.
e.
Others.
Stage 3 (20%): Data
mart populating. Refer to Chapter 7 of
the textbook for the tasks in this stage. The
deliverables: (1) The screenshot of ETL system logic design, (2) A brief
description of ETL system implementation, (3) Notes about data quality and
other issues during data preprocessing. It
will be 2-3 pages including figures and/or screenshots. Due
April 7, Monday.
Stage 4 (20%): Analysis report: The
analyses of the data in the data mart. The report contains:
(1) 4-6
findings from the data analysis.
(2) 2-4
diagram/charts/tables generated with SAS Enterprise Guide.
(3) An
optional choice of the data analysis is to do OLAP with the cube created from
your data mart. Try to wok out the OLAP application outcomes using SAS
Enterprise Guide or SSAS. You may “drill down” or “drill up” the OLAP cube.
Both SAS EG and SSAS Browser allow you to expand the dimension and to explore
the data in tables or graphs.
The report’s length is 2-4 pages plus figures and tables. Due April 21, Monday.
Stage 5 (30%): The
compilation of the previous deliverables with modifications and enhancements.
Hardcopy and electronic version are both needed, due May 7, Wednesday.
(1)
The report will
contain the following parts
a.
Cover page. Including
the team’s name, team members, title of the report, course number and name, and
the date the report is submitted.
b.
Table of contents
c.
One-page executive
summary
d.
Introduction section.
Mainly based on the contents of Deliverable #1 but with extensions:
-
The background of the
project, such as what are the businesses supported by this project.
-
Analytic themes, i.e.
business requirement to this project. Relevant business processes that generate
data and use the data are expected
-
The availability of
the dataset, including the quality of the data. You have process the data for
ETL. So you should have pretty much to say about this.
e.
Data mart development
section, mainly deliverable #2 with necessary modifications
f.
Data mart populating
section, mainly deliverable #3 with necessary modifications
g.
Analysis report
section, mainly deliverable #4 with necessary modifications (highly suggested
to do so)
h.
Summary section of the
project, 1-2 page. It may contain: what you have
learned, the top three comments on this project, the progress of the project
regarding the time schedule.
i.
Reference list. If a
reference is from the Internet, you need to list the title of the page in
addition to the url address.
(2)
A hardcopy of the
report will be submitted for grading, plus an electronic copy emailed to zhangxi.lin@hotmail.com.
------------------------------------------------
Project 2 Data Preparation
or Target Advertising (Graduate Students only)
Find
a web log dataset COMMREX.COM-ACCESS_LOG-9711.TXT from the shared directory \\BASRV1\ISQS 3358. Perform the
following tasks:
1)
Import
the text file into a SAS dataset
2)
Take
lstsch3.cgi as the target page to convert the file into the format with one row
per session. Follow the format the same as QUERY_DEC1997_VER3.XLS in \\BASRV1\ISQS 3358. If you have difficulty
to write SAS code to process the data, you may use the Summary Statistics data
describing function in SAS Enterprise Guide to process the data. Then copy the
generated code into Base SAS for modification.
3)
Use
SAS Enterprise Miner to perform classification modeling.
4)
The
final report in both hardcopy and electronic copy contains:
a.
The
SAS code for the project
b.
The
screenshot of the SAS Enterprise Guide result
c.
The
data mining results with necessary explanations
The
project is due on May 9 Friday.
-----------------------------------------------
Using the data set,
especially the dataset from real business, will help you earn extra credit
because of the originality. You need to explain the data collection and data
preprocessing tasks you have performed in the report. If you have difficulties
to find a good data set for your project, you may do one of the following as
your team project:
Project A:
Cotton/cottonseed data warehousing
If
you know the cotton agriculture, this project may interest you. The Excel files
\\BASRV1\ISQS3358\COTTON_DATA\ COTTON.XLS containing cotton/cottonseed dataset. The
following are tasks to be fulfilled:
1)
Study
the dataset “Cotton.xls” carefully. The main dataset is in “data” table. You
also need to study other worksheet. You will notice that the main dataset has
missed some values but this does not have major effect on this project.
2)
Define
5-8 analytic themes for the data warehouse.
3)
High
level dimensional model design. This is based on the paperwork
a.
Identify
2-4 business processes
b.
Declaring
the grain for the data warehouse
c.
Define
dimensions
d.
Identifying
the facts (according to the bus matrix)
4)
Detailed
dimensional model development. This will be based on computer work.
a.
You
must specify the primary key for each table.
b.
The
model must contain at least one Type 2 SCD, one conformed dimension, and any
other specific treatment of modeling.
c.
Indicate
the components of junk dimension, heterogeneous facts, etc.
5)
Dimensional
model review and validation.
6)
Populate
the data warehouse with the cotton/cottonseed data (use SSIS and Management
Studio for this ETL task).
7)
Develop
a few applications, such as queries, OLAP applications (use SSAS and Management
Studio for this).
Project B:
Commercial Bank Data Warehousing
If
you have good knowledge in finance this project may fit you. Please check the
datasets on the website of Federal Reserve Bank of
1)
Chooses
the dataset of four quarters in a year.
2)
Download
and extract the datasets one by one using Base SAS (double-click the file and
it will be extracted automatically to SAS’s Work library). You need to do this
four times for datasets in a year.
3)
Study
the variables of the dataset and conceive 2-4 analytic business questions.
Decide which columns will be used for data analysis. Use up to 100 variables
from at least two groups of columns to construct data warehouse for analysis
purpose. Make sure what you need and you understand the columns well according
to the explanations of the columns on http://www.chicagofed.org/economic_research_and_data/commercial_bank_dictionary.cfm.
4)
Data
warehouse dimensional model design
5)
Examine
the quality of the extracted data
6)
ETL
system design and implementation
7)
Data
warehouse population
8)
OLAP
implementation
9)
Comprehensive
data analysis
Project C:
AWC Data Mart Project
Use the AdventureWorksDM
data warehouse in OREDB Server. You need to study the data warehouse carefully
and identify your business themes in order to reduce the size of your data
mart. 6-10 dimensions will be good for your project. The Microsoft Data Warehouse Toolkit, Joy Mundy and Warren Thornthwaite,
Wiley, 2006, ISBN 0-471-26715-5, contains useful information about the data
warehouse.