Syllabus for CS 327E Elements of Databases - Spring 2017

Class Times: Mondays and Wednesdays 6:30pm - 8pm
Class Location: CLA 1.104

Instructor: Shirley Cohen
Email: scohen at cs dot utexas dot edu
Office hours: After class or by appointment in CLA 0.108

TA: Alan Ma
Email: alanma0907 at utexas dot edu
Office hours: Thursdays 12pm - 2pm in the GDC 3rd floor lab

TA: Sean Wang
Email: seanyusa at utexas dot edu
Office hours: Fridays 9:30am - 11am in the GDC basement

Course Description:
This course is designed to give students a practical understanding of database technologies. The focus is on designing and building databases and SQL interfaces, rather than on the theory behind them.

The content of the course is organized into three main topics: 1) schema design with a focus on designing for analytical workloads; 2) SQL with an emphasis on the standard features of the query language as well as user-defined types and functions; and 3) database application development, namely loading structured datasets into a database, applying transforms to cleanse and restructure the raw data, and querying the transformed data from a reporting application. The course will also cover such topics as cloud computing, data warehousing, and big data.

Throughout the term we will use Postgres and Amazon Redshift for demos, labs, and projects. I have chosen these two database engines because they are easy to use, yet expressive and extensible, and particularly well-suited for processing analytical workloads.

There will be three lab projects, each worth 10%. The labs will build on each other as well as build up to the final project, worth 30%. Through the labs and final project, students will develop an end-to-end data analytics pipeline using Open Data, Postgres, Python, Amazon Redshift, and Amazon QuickSight. This pipeline will have a loader program that ingests two or more related datasets from csv into a database. Once loaded, the data will be transformed and standardized into a structure that is suitable for analysis and reporting. A reporting dashboard will access the database tables and perform some basic visualizations. The pipeline will process a high volume of data to illustrate real-world challenges when building analytics databases at scale.

Students will work in pairs on the labs and final project. The code for all deliverables will be kept under version control on Github. At the end of the term, each group will submit a technical report that describes their project and lessons learned throughout the various assignments. Students will choose their lab partner at the start of the term and work with the same partner throughout.

The course will be taught using active learning techniques as much as possible. The goal is to spend the majority of class time clarifying difficult concepts and actively working through challenge problems, labs, and the final project rather than listening to a traditional lecture. In order for this to happen, students must learn the basic concepts before coming to class by doing the assigned readings and exercices. As an extra incentive for keeping up with the readings, there will be a short quiz at the start of class based on the assigned readings for that day. In addition, students will earn points for participating in class exercises.

Below is a list of some of the topics we will be covering throughout the term:

Database Design:
-logical database design
-inheritance
-normalization and denormalization
-physical database design
-indexing

SQL:
-select-from-where
-order-bys
-joins
-inserts, updates, deletes
-aggregates and group-bys
-views
-user-defined types and functions
-users, roles, grants

Application Development:
-data ingestion
-data transformation
-data exchange
-data integration
-input validation and protections against SQL injections

Prerequisites:
This course expects familiarity with programming in Python. CS 303E, CS 307 or the equivalent is required.

Textbooks:
There are three required texts for the course and one optional. If you are new to Python, I highly recommend that you obtain all four texts.

Required texts:
Jan Harrington's Relational Database Design and Implementation, 4th edition, 2016. Available as an ebook through the UT Library.
Juba, Vannahme, and Volkov's Learning PostgreSQL, 1st edition, 2015.
Amazon's Redshift Developer Guide, API Version 2012-12-01.
Amazon's QuickSight User Guide, 2017.

Optional text:
Mark Lutz's Learning Python, 5th edition, 2013. Available as an ebook through the UT Library.

Clickers:
We will be using clickers for reading quizzes and class participation. Students will need to have their own clickers and bring them to class each time. Either the newer i>clicker 2 or the older i>clicker model can be used. The University Co-op carries the newer i>clicker 2 model. Remember to register your clicker with this class through Canvas (click on the i>Clicker link from the left navigation menu and follow the instructions).

Grading Rubric:
The basic grading rubric is comprised of the five components listed below:

Note that there is no final for this class and the midterm will be a closed book exam.

Late Submission Policy:
10% reduction in the grade per day.

Resources:
Amazon Web Services for exercises, labs, and final project.
Github for code repository and version control.
Lucidchart for diagramming.
Piazza for class discussions.
Canvas for grade reporting.

Academic Integrity:
This course will abide by UTCS' code of academic integrity.

Students with Disabilities:
Students with disabilities may request appropriate academic accommodations.

Week-by-Week Schedule:
Below is a week-by-week schedule that includes the important milestones and assigned readings: