ISQS 6339 Data Management and Business Intelligence

 

Instructor: Zhangxi Lin

 

------ + ------ + ------- + ------- + ------- + -------

 

Project 1, Due: October 30, Tuesday

 

Cotton/cottonseed data warehousing (subject to refining)

 

The Excel files containing cotton/cottonseed dataset is available in the shared directory. The following are tasks to be fulfilled:

1)       Study the dataset “Cotton.xls” carefully. The main dataset is in “data” table. You also need to study other worksheet. You will notice that the main dataset has missed some values but this does not have major effect on this project.

2)       Define 5-8 analytic themes for the data warehouse. Due October 9.

3)       High level dimensional model design. This is based on the paperwork Due October 9

a.       Identify 2-4 business processes

b.       Declaring the grain for the data warehouse

c.       Define dimensions

d.       Identifying the facts (according to the bus matrix)

4)       Detailed dimensional model development. This will be based on computer work. Due October 16

a.       You must specify the primary key for each table.

b.                               The model must contain at least one Type 2 SCD, one conformed dimension, and any other specific treatment of modeling.

c.       Indicate the components of junk dimension, heterogeneous facts, etc.

5)       Dimensional model review and validation. Due October 23

6)       Populate the data warehouse with the cotton/cottonseed data (use SSIS and Management Studio for this ETL task). Due October 23

7)       Develop a few applications, such as queries, OLAP applications (use SSAS and Management Studio for this). Due October 30

 

Deliverables:

1)       A report to describe the logic design of the data warehouse (task 1-4). Due October 16 (10%)

2)       An email to notify the completion of a development environment, i.e. the populated data warehouse (task 5, 6). Present one of ETL tasks in the class. Due October 23 (30%)

3)       Data warehouse demonstration. Due October 30 (30%)

4)       An upgraded report containing the outcomes of queries and OLAP applications. Due October 30 (30%)

 

The report must be emailed to the instructor.

 

Note: A project group consists of two students. However, you are encouraged to work alone.

 

------ + ------ + ------- + ------- + ------- + -------

 

Project 2, Due: December 10, Monday

 

Commercial Bank Data Warehousing and Data Analysis

 

Please check the datasets on the website of Federal Reserve Bank of Chicago http://www.chicagofed.org/economic_research_and_data/commercial_bank_complete_files_2001_2006.cfm. The datasets are pretty good – good for research and exercises.

 

The project will be conducted in three stages

 

Stage 1: Business question analysis and data review (20%)

 

1)       Each of you chooses the dataset of four quarters in a year. The last digit of the year matches the last number of your SQL Server 2005 login ID.

2)       Download and extract the datasets one by one using Base SAS (double-click the file and it will be extracted automatically to SAS’s Work library). You need to do this four times for datasets in a year.

3)       Study the variables of the dataset and conceive 2-4 analytic business questions. Decide which columns will be used for data analysis. You will use up to 100 variables from at least two groups of columns to construct data warehouse for analysis purpose. Make sure what you need and you understand the columns well according to the explanations of the columns on http://www.chicagofed.org/economic_research_and_data/commercial_bank_dictionary.cfm.

 

Deliverables due Nov 20: (1) 2-4 analytic business questions (analytic themes), (2) The columns to be used for data warehousing (descriptions must be included).

 

Stage 2: Data preparation (30%)

1)       Data warehouse dimensional model design

2)       Write a SAS program to extract the selected columns into text files to be loaded later

3)       Examine the quality of the extracted data

4)       ETL system design and implementation

5)       Data warehouse population

 

Deliverables due Nov 29: (1) dimensional model design, (2) ETL system design and implementation, (3) Populated data warehouse, and (4) A brief description of the completed tasks (1-2 pages).

 

Stage 3: Data Analysis (50%)

1)       OLAP implementation

2)       Comprehensive data analysis

 

Deliverables due Dec 10: The final report with (1) all previous documents, (2) the analysis of the data with tables, charts and findings, (3) references if any.

 

Note: This project will be accomplished individually.