ISQS 6339 Data Management and Business Intelligence
Instructor:
------ +
------ + ------- + ------- + ------- + -------
Project 1, Due: October 30, Tuesday
Cotton/cottonseed data warehousing
(subject to refining)
The Excel files containing cotton/cottonseed dataset
is available in the shared directory. The following are tasks to be fulfilled:
1)
Study
the dataset “Cotton.xls” carefully. The main dataset is in “data” table. You
also need to study other worksheet. You will notice that the main dataset has missed
some values but this does not have major effect on this project.
2)
Define
5-8 analytic themes for the data warehouse. Due October 9.
3)
High
level dimensional model design. This is based on the paperwork Due October 9
a.
Identify
2-4 business processes
b.
Declaring
the grain for the data warehouse
c.
Define
dimensions
d.
Identifying
the facts (according to the bus matrix)
4)
Detailed
dimensional model development. This will be based on computer work. Due October
16
a.
You
must specify the primary key for each table.
b.
The
model must contain at least one Type 2 SCD, one conformed dimension, and any
other specific treatment of modeling.
c.
Indicate
the components of junk dimension, heterogeneous facts, etc.
5)
Dimensional
model review and validation. Due October 23
6)
Populate
the data warehouse with the cotton/cottonseed data (use SSIS and Management
Studio for this ETL task). Due October 23
7)
Develop
a few applications, such as queries, OLAP applications (use SSAS and Management
Studio for this). Due October 30
Deliverables:
1)
A
report to describe the logic design of the data warehouse (task 1-4). Due
October 16 (10%)
2)
An
email to notify the completion of a development environment, i.e. the populated
data warehouse (task 5, 6). Present one of ETL tasks in the class. Due October
23 (30%)
3)
Data
warehouse demonstration. Due October 30 (30%)
4)
An
upgraded report containing the outcomes of queries and OLAP applications. Due
October 30 (30%)
The report must be
emailed to the instructor.
Note: A project group consists
of two students. However, you are encouraged to work alone.
------ + ------
+ ------- + ------- + ------- + -------
Project 2, Due:
December 10, Monday
Commercial Bank Data Warehousing and Data Analysis
Please check the datasets on the website of Federal
Reserve Bank of
The project will be conducted in three stages
Stage 1: Business question analysis and data review
(20%)
1)
Each
of you chooses the dataset of four quarters in a year. The last digit of the
year matches the last number of your SQL Server 2005 login ID.
2)
Download
and extract the datasets one by one using Base SAS (double-click the file and
it will be extracted automatically to SAS’s Work
library). You need to do this four times for datasets in a year.
3)
Study
the variables of the dataset and conceive 2-4 analytic business questions.
Decide which columns will be used for data analysis. You will use up to 100
variables from at least two groups of columns to construct data warehouse for
analysis purpose. Make sure what you need and you understand the columns well
according to the explanations of the columns on http://www.chicagofed.org/economic_research_and_data/commercial_bank_dictionary.cfm.
Deliverables due
Nov 20: (1) 2-4 analytic business questions (analytic themes), (2) The
columns to be used for data warehousing (descriptions must be included).
Stage 2: Data preparation (30%)
1)
Data
warehouse dimensional model design
2)
Write
a SAS program to extract the selected columns into text files to be loaded
later
3)
Examine
the quality of the extracted data
4)
ETL
system design and implementation
5)
Data
warehouse population
Deliverables due Nov
29: (1) dimensional model design, (2) ETL system design and implementation,
(3) Populated data warehouse, and (4) A brief description of the completed
tasks (1-2 pages).
Stage 3: Data Analysis (50%)
1)
OLAP
implementation
2)
Comprehensive
data analysis
Deliverables due Dec
10: The final report with (1) all previous documents, (2) the analysis of
the data with tables, charts and findings, (3) references if any.
Note: This project
will be accomplished individually.