Syllabus for CS 327E Elements of Databases - Spring 2019

Class Times: Mondays 6:00pm - 9:00pm
Class Location: RLP 1.106

Instructor: Shirley Cohen
Email: scohen at cs dot utexas dot edu
Office hours: After class or by appointment in RLP 1.106

TA: Prithvi Chowhan
Email: chowhan at utexas dot edu
Office hours: Wednesdays from 12:30 to 1:30pm and Fridays from 2pm to 3pm at TA Stations, GDC basement

TA: William Chia
Email: william dot chia at utexas dot edu
Office hours: Tuesdays and Thursdays from 1pm - 2pm at TA Stations, GDC basement

Course Description:
This course is designed to give students a practical understanding of databases and data systems. The goal is to learn modern data management and data processing techniques through lots of problem solving and hands-on projects.

The content of the course is organized into three broad areas: 1) SQL with an emphasis on the standard features of the language; 2) data models from relational to hierarchical; and 3) dataflow programming for processing data at-scale.

We will first construct a simple analytics database from structured data. We will then expand the scope of the database with additional datasets from a variety of sources. We will explore the raw data through SQL and gradually enrich and transform the data to create a single view across datasets. This work will be implemented on Google Cloud Platform using BigQuery, Apache Beam/Dataflow, Apache Airflow/Composer, and Data Studio.

Below are some of the topics we will be covering during the term:

- select-from-where
- order-bys
- joins
- inserts, updates, deletes
- aggregates
- group-bys
- subqueries
- functions

Data Modeling:
- referential integrity
- normalization
- hierarchies
- repeated fields
- views
- partitioning

Dataflow Programming:
- ingestion
- cleansing
- transformation
- data integration
- workflow orchestration

The course assumes a programming background and in particular, a solid working knowledge of Python scripting. As such, the prerequisites for this course are CS 303E, CS 307 or the equivalent. Familiarity with SQL is also helpful, but not required.

There is one required text for this class:
- Clare Churcher, Beginning SQL Queries: From Notice to Professional, Second Edition, 2016. Available as an ebook from the UT Library.

Supplemental Readings:
The course requires pouring over product documentation for BigQuery, Apache Beam/Dataflow, Apache Airflow/Composer, and Data Studio. The product documentation is all available online.

There will be weekly quizzes based on assigned readings. The quizzes will have 5 multiple-choice questions and they will be taken at the start of class with your iClicker. iClickers are available as a remote and mobile app. Choose one of these options (not both) and register your device through Canvas.

The most important component of this course is the project. The project will be to construct a data warehouse that brings together previously siloed datasets with the goal of deriving some interesting insights from the combined data. You will pair up with another student in the class and work collaboratively on the project. Each team will choose their datasets and formulate their queries for analysis. The project will be divided into several milestones and will span the entire term.

There is only one exam in this course which is a midterm. The exam is comprehensive and will cover all the material on SQL, data modeling, and dataflow programming seen up to that point in the semester. It is a closed-book exam and it will be done in class. No make-up exam will be offered.

This class will be taught using active learning techniques. The goal is to spend the majority of class time clarifying difficult concepts and actively wo rking through problems rather than listening to a traditional lecture. You will need to bring a laptop to class and sit with your partner to able to work together on project milestones. Participation questions will be answered with your iClicker and they will be based on practice problems and project mile stones.

Grading Rubric:
The basic grading rubric is comprised of the four components listed below:

Note: The final grade will use the plus/minus grading system.

Late Submission Policy:
There is a 10% reduction in the grade per day. This applies to all project submissions throughout the term.

- Google Cloud Platform for practice problems and project work.
- GitHub for code repository, version control, and how-to guides.
- Lucidchart for diagramming.
- Piazza for asynchronous communication (announcements, questions, discussions).
- Canvas for grade reporting.

Academic Integrity:
This course will abide by UTCS' code of academic integrity.

Students with Disabilities:
Students with disabilities may request appropriate academic accommodations.

Week-by-Week Schedule:
Below is a week-by-week schedule that includes the important milestones and assigned readings:
This course is generously supported by Google by giving us access to their Cloud Platform.