ISQS 6339 Lecture Notes

 

Instructor: Zhangxi Lin

 

-----------------------------------------------------------------------------------

Home | Schedule | Projects | Video Demons

-----------------------------------------------------------------------------------

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 1, 01/16/2014, Thursday

 

Topic: Introduction

1)       Basic BI Concepts

2)       BI trend

 

Review questions:

 

1)     What is BI?

2)     Why is BI getting hot?

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 2, 01/21/2014, Tuesday

 

Topic: Anatomy of Business Intelligence (1)

 

1)       Cases

2)       BI Framework

3)       Applications

4)       BI tools

 

Terminology: data, information, knowledge, business intelligence, data warehouse, meta data, ETL, business rules, OLAP

 

Reading:

1)     How Much Information? 2003” (http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/execsum.htm)

2)     Find a BI application case from the web, and understand how it works.

3)     Find paper “CACM2011 Overview of BI.pdf” in the network drive under ~\Texts\Readings\. Read it carefully.

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 3, 01/23/2014, Thursday

 

Topic: Anatomy of Business Intelligence (2)

 

1)       Database vs. data warehousing

2)       Data warehousing with Microsoft SQL Server 2008

 

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 4, 01/28/2014, Tuesday

 

Topic: Data warehousing

 

1)       Demonstration

2)       Concepts of data warehousing

3)       Architecture of data warehouse

4)       Data Integration

 

References:

1)     SQL Server Data Warehousing http://technet.microsoft.com/en-us/sqlserver/dd421879.aspx

2)     SQL Server Tutorial for Beginners http://www.youtube.com/watch?v=ZNObiptSMSI&list=PL08903FB7ACA1C2FB

3)     What’s SQL Server 2012? http://www.youtube.com/watch?v=3m95ie9Na-o 16’20”

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 5, 01/30/2014, Thursday

Topic: Data Warehousing

1)     Exercise 1

2)     Dimensional modeling

3)     Unified dimensional model

4)     Data warehousing methodology

5)     Three phases of dimensional modeling

 

References:

 

1)     Inmon vs. Kimball - Book

2)     Inmon vs. Kimball - comments

3)     Kimball and Inmon DW Models

4)     Design of data warehouse

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 6, 02/04/2014, Tuesday

 

Topic: Creating data mart

 

1)     Quiz 1

2)     Data warehousing with SQL Server 2008

3)     Fact tables

4)     Exercise 2 - Create a data mart

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 7, 02/06/2014, Thursday

 

Topic: Creating data mart (2)

 

1)     Quiz 1 review

2)     Exercise 2 Review

3)     Types of dimensions

4)     How to create data mart with SSAS - demo

5)     Exercise 3 – Create data mart with BIDS

 

Online References:

1)     Measures: http://en.wikipedia.org/wiki/Measure_%28data_warehouse%29

2)     Dimensions: http://en.wikipedia.org/wiki/Dimension_%28data_warehouse%29

3)     Slowly changing dimensions : http://en.wikipedia.org/wiki/Slowly_changing_dimension

4)     Surrogate keys : http://www.geekinterview.com/question_details/17591

5)     Aggregate: http://en.wikipedia.org/wiki/Aggregate_%28Data_Warehouse%29

6)     Many-to-many relationship in data warehousing: http://www.pythian.com/news/364/implementing-many-to-many-relationships-in-data-warehousing/

7)     Degenerate dimensions: http://en.wikipedia.org/wiki/Degenerate_dimension

 

Homework assignment 1 (optional):

 

Search the web to find the comprehensive explanations of the following terms. Give an expample for each of them.

1)     Junk dimensions

2)     Many-to-many or multivalued dimensions

3)     Degenerate dimensions

 

Homwork is due a week later after the class meeting. Send your answers by email isqs6347@gmail.com, with a subject title "ISQS 6339 homework 1 - <your name>". A late submission with a week after the deadline is also fine, but may not receive the timely feedback.

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 8, 02/11/2014, Tuesday

 

Topic: ETL system development (1)

 

1)     Term project orientation

2)     An introduction to SSIS

3)     Control flow tasks

4)     Data flow items

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 9, 02/13/2014, Thursday

 

Topic: ETL system development (2)

 

1)     ETL application debugging

2)     Exercise 4 – Populating Dimension Tables for Maximum Miniatures Manufacturing Data Mart (Guidelines)

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 10, 02/18/2014, Tuesday

 

Topic: ETL system development (3)

 

1)     Continue Exercise 4

2)     Extending ETL skills

3)     Illustrative example: Updating database

 

Homework assignment 2 (Due 03/04/2013, Tuesday):

 

The following assignments are optional. Students who fulfill any of the following questions will receive extra credits.

 

Check Updating_database.htm.  Follow the instructions in the file to complete the ETL system development. Submit the results via email to isqs6347@gmail.com .

 

Note: PersonDetails01 & PersonDetails02 are in the network drive ~\ISQS3358\Downloaded\SSIS\RawFiles.

PersonDetails02 could be saved as a txt or csv file for the project because the Excel files have problem in current

Citrix SQL Server

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 11, 02/20/2014, Thursday

 

Topic: Cubism (1)

 

1)     Quiz 2

2)     MaxMinManufacture Data Mart Debugging

3)     A simple cube for MaxMinManufacture Data Mart

4)     Defining a time dimension

5)     Exercise 5 – Populating Fact Tables (Guidelines)

 

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 12, 02/25/2014, Tuesday

 

Topic: Cubism (2)

 

1)     Advanced topics in OLAP

2)     How to create a time dimension

3)     Exercise 6 – Create cube for MaxMinManufacture Data Mart

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 13, 02/27/2014, Thursday

 

Topic: Cubism (3)

 

1)     Continue Exercise 6

 

References:

1)     SQL Server Data Mining: http://www.sqlserverdatamining.com/ssdm/Home/Tutorials/tabid/57/Default.aspx

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 14, 03/04/2014, Tuesday

 

Topic: Enterprise Guide – Getting Started

1)             Quiz 3

2)             Introduction

3)             Data access and process

4)             Listing report

5)             Exercise EG-EX1

Readings: EG Chapter 1-3, Prog-I

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 15, 03/06/2014, Thursday

 

Topic: Enterprise Guide - Tasks

 

1)     EG tasks

a.     Frequency reports

b.     Variable parameters

2)     Exercise EG EX1

 

Readings: EG Chapter 3

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 16, 03/11/2014, Tuesday

 

Big data symposium attendance (no class)

 

BigData EX1

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 17, 03/13/2014, Thursday

 

Topic: Enterprise Guide

 

Exercise EG-EX2

 

Readings: EG Chapter 4, Prog-I

 

------- + ------ + ------- + ------- + ------- + -------

 

 

Spring Break

 

------- + ------ + ------- + ------- + ------- + -------

 

Lecture 18, 03/25/2014, Tuesday

 

Topic: Enterprise Guide

 

1)     Exercise EG EX3

2)     Quiz 4

 

Readings: EG Chapter 5

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 19, 03/27/2014, Thursday

 

Topic: Enterprise Guide – Statistical analysis with EG 

 

Exercise EG-EX4

 

Readings: EGBS-2 

------- + ------ + ------- + ------- + ------- + -------

Lecture 20, 04/01/2014, Tuesday

 

Topic: Introduction to SAS Programming

 

Readings: SAS-1 Ch 1-4

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 21, 04/03/2014, Thursday

 

Topic: Introduction to SAS Programming

 

Readings: SAS-1 Ch 5-6

------- + ------ + ------- + ------- + ------- + -------

Lecture 22, 04/08/2014, Tuesday

 

Topic: Big Data & Cloud Computing (Group presentations)

 

Groups present in the following order:

 

Order

Presenting Team

Discussant team

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Choose one of the following topics, or another one else, as a team’s presentation:

1.     Current status of big data applications, taking one case, such as digital city, healthcare, transportation, financial market, education, e-commerce, etc.

2.     The main technologies related to big data, taking one of the following: architecture, data management, distributed processing, cloud computing, resource allocation, etc.

3.     Reiterating one of the presentations on the big data symposium with your comments

4.     A successful case of big data services

Each team will have only 10 minutes to present, allowing 2 minutes for Q&A.

Slides are necessary. 6-8 slides are enough. Please notify me the title and send me the slides before coming Tuesday morning (send to my TTU email address directly). Then I will be able to list your team’s topic in the agenda.

 

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 23, 04/10/2014, Thursday

 

Topic: Hadoop & MapReduce

 

1)     Quiz 5 (SAS Programming, open book/open notes)

2)     Hadoop

3)     MapReduce

 

Online videos:

1)     What is Hadoop, http://www.youtube.com/watch?v=OoEpfb6yga8, 14:00

2)     Introduction to MapReduce, http://www.youtube.com/watch?v=ht3dNvdNDzI, 11:31

3)     The modern data operating system: http://www.youtube.com/watch?v=d2xeNpfzsYI, 1:16:44

4)     Cloud computing and MapReduce, http://www.youtube.com/watch?v=yjPBkvYh-ss, 46:17

5)     MapReduce, http://www.youtube.com/watch?v=zVSSsJ_ua4Q, 38:24

6)     Google MapRduce roundtable, http://www.youtube.com/watch?v=NXCIItzkn3E, 25:28

7)     Hadoop Mahout: http://www.youtube.com/watch?v=WB9zr0IZCPQ, 3:06

8)     R & Hadoop: http://www.youtube.com/watch?v=QEaOfTuveGg, 1:05:25

Readings:

1)     Hadoop Tutorial: http://developer.yahoo.com/hadoop/tutorial/

2)     MapReduce Tutorial: http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html

3)     Hadoop Slides: http://www.slideshare.net/awesomesos/hadoop-tutorial

4)      K-Means Clustering with MapReduce: http://horicky.blogspot.com/2011/04/k-means-clustering-in-map-reduce.html

5)     Apache Mahout: http://mahout.apache.org/

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 24, 04/15/2014, Tuesday

 

Topic: HDFS, HIVE, HBase, and NoSQL

 

 

Online videos:

1)     HDFS http://www.youtube.com/watch?v=ziqx2hJY8Hg, 33:36

2)     IBM HIVE Application Demo: http://www.youtube.com/watch?v=OzbiAEPAWwQ, 10:00

3)     HIVE: http://www.youtube.com/watch?v=Y3UXDtDR9bg (1), http://www.youtube.com/watch?v=1hDhpVmeSGI (2)

 

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 25, 04/17/2014, Thursday

 

Topic: Guest speakers

 

·         12:30-1:00p: Jonghyun Kim, “Perl programming for SQL Server Admin”

·         1:00-1:50p: Dale Ganus, “Harnessing IT advancement for your career”

 

Moderators: Tianxi Dong, Siming Li

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 26, 04/22/2014, Tuesday

 

Topic: Apache ZooKeeper & Pig

 

Online videos:

1)     Storm: http://www.youtube.com/watch?v=Ycvg0PCQ-sM

2)       http://www.slideshare.net/Hadoop_Summit/realtime-analytics-with-storm

3)       Analyzing Big Data with Twitter - http://www.youtube.com/watch?v=UiIjEzW3br8, 1:21:40

4)      Using Apache Pig With Amazon Elastic MapReduce http://www.youtube.com/watch?v=iMOzC835H4I, 5:14

5)       http://www.youtube.com/watch?v=Z7EBdF6Bk3E

6)       http://www.youtube.com/watch?v=Kgf9EjTNucM

7)       https://www.youtube.com/watch?v=ZC0kMiKKbug

8)       http://vimeo.com/26017227

 

Readings:

1)      Apache ZooKeeper: http://zookeeper.apache.org/

2)      Apache Pig: http://pig.apache.org/

3)       Overview-http://zookeeper.apache.org/doc/r3.2.2/zookeeperOver.html

4)       Zookeeper Future : http://www.slideshare.net/cloudera/zookeeper-futures

5)       Wiki : http://en.wikipedia.org/wiki/Apache_ZooKeeper

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 27, 04/24/2014, Thursday

 

Topic: Mahout, Amazon EC2

 

 

Online videos:

1)      Introduction to Apache Mahout: http://www.youtube.com/watch?v=WB9zr0IZCPQ, 3:06

2)     Amazon EC2 15 minute introduction: http://www.youtube.com/watch?v=ZAB8wCg9MyE, 15:00

 

Readings:

1)      Amazon EC2: http://aws.amazon.com/ec2/

2)     Amazon EC2 scale: http://www.zdnet.com/blog/open-source/amazon-ec2-cloud-is-made-up-of-almost-half-a-million-linux-servers/10620

3)      Mahout: http://mahout.apache.org/

 

 

References by Team EMC:

 

·         Introducing Apache Mahout (2009). IBM. Retrieved from https://www.ibm.com/developerworks/java/library/j-mahout/index.html

·         Olety,  V. (2012, January 31). Apache Mahout: Machine Learning for Big Data. 8KMiles. Retrieved from http://cloudblog.8kmiles.com/2012/01/31/apache-mahout-machine-learning-for-big-data

·         Mahout Wiki: https://cwiki.apache.org/MAHOUT/mahout-wiki.html   

·         https://cwiki.apache.org/confluence/display/MAHOUT/Powered+By+Mahout

·         http://www.ibm.com/developerworks/java/library/j-mahout/

·         http://www.infoq.com/news/2009/04/mahout

·         Video: http://www.youtube.com/watch?v=WB9zr0IZCPQ

 

References of 'How to use Amazon EC2' from Team-Citigroup.

 

Videos:

https://www.youtube.com/watch?v=RkVSkL76U-M

https://www.youtube.com/watch?v=xrxQXfE7t9A

http://www.youtube.com/watch?v=bBajLxeKqoY

http://www.youtube.com/watch?v=_6n6Wqbtjqo

 

Website: 

http://aws.amazon.com/ec2/

 

FAQ's:    

http://aws.amazon.com/ec2/faqs/

 

Running Hadoop on Amazon EC2:

http://wiki.apache.org/hadoop/AmazonEC2

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 28, 04/29/2014, Tuesday

 

Topic: Final exam review

 

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 29, 05/01/2014, Thursday

 

Topic: Term Project Presentation (1)

 

------- + ------ + ------- + ------- + ------- + -------

Lecture 30, 05/06/2014, Tuesday

 

Topic: Term Project Presentation (2)

 

------- + ------ + ------- + ------- + ------- + -------