Syllabus for CS 327E Elements of Databases - Fall 2021

Class Meetings: Friday 4:00pm - 7:00pm
Class Location: Zoom and GDC 1.304*

Instructor: Shirley Cohen
Email: scohen at cs dot utexas dot edu
Office Hours: Mondays from 7:00pm - 8:00pm on Zoom

TA: Karan Sadananda Karnad
Email: karan dot karnad at utexas dot edu
Office Hours: Tuesdays and Wednesdays from 9:30am - 11:00am on Zoom

TA: Sameer Haniyur
Email: sameerhaniyur2 at utexas dot edu
Office Hours: Thursdays from 11:30am - 1:00pm and Fridays from 8:30am - 10:00am

*The class will have meetings both on Zoom and in-person. Even when we are in-person, the class will be available via Zoom for students who choose to join virtually. For all Zoom links (class meetings and office hours), please see Canvas.

Course Description:
This course is designed to give students a practical understanding of databases and data systems. The goal is to learn modern data management and data processing techniques through a mix of best practices, experimentation, and problem solving.

The content of the course is organized into three broad areas: 1) query languages with an emphasis on SQL; 2) data models from relational to document to graph; and 3) data engineering, including data processing and scalability testing.

We will construct several operational and analytical databases throughout the term. This work will be done on Google Cloud Platform using a variety of database technologies and data science tools: MySQL, Postgres, BigQuery, Spanner, Firestore, MongoDB, Neo4j, Jupyter Notebooks, and Data Studio.

Below are some of the topics we will cover:

SQL:
- select-from-where
- order-bys
- joins
- inserts, updates, deletes
- aggregates
- group-bys
- subqueries

Data Models:
- relational
- document
- graph
- nested

Data Engineering:
- ingestion
- data transformations
- data visualizations
- scalability testing

Prerequisites:
The course assumes a programming background and in particular, a solid working knowledge of Python scripting. As such, the prerequisites for this course are CS 303E, CS 307 or the equivalent. Familiarity with SQL is also helpful, but not required.

Textbooks:
There are two required texts for this course:
- Alan Beaulieu, Learning SQL, Third Edition, 2020.
- Dan Sullivan, NoSQL for Mere Mortals, First Edition, 2015.

Supplemental Readings:
In addition to the required readings, the assignments will require regularly consulting the product documentation on Cloud SQL, Cloud Spanner, BigQuery, Firestore, MongoDB, Neo4j, and Data Studio. All documentation is available online.

Projects:
The most important component of this course are the projects. The projects are intended to give you hands-on experience with the database systems and tools. They will start with the basic data management operations and move on to more advanced capabilities.

There are two types of projects, weekly projects and the Final Project. The weekly projects are aimed at giving you some practice with the series of database systems. They will be assigned as homework and will require outside class time to complete. The Final Project will be a scalability study of a chosen database system. You will design experiments to evaluate the scalability of the system and document your findings in a written report.

All projects will be carried out in groups of two students. You will form groups at the start of the term and work with the same partner throughout the term. More details on the projects will be provided in the week-by-week section below.

Exams:
There will be 2 midterms and no final exam. The tests are comprehensive and will cover all the material to-date, including readings, projects, and lectures. They will be open-book and taken during class time via Canvas. Unfortunately, no make-up tests will be offered due to our limited resources.

Participation:
We will be holding synchronous class meetings so that you have the opportunity to ask questions and work together with other students. My goal is to spend the majority of class time actively working through problems and clarifying difficult concepts. You will need to have a stable internet connection and a laptop or desktop computer so that you can participate.

Participation questions will be multiple choice and answered with UT Instapoll via Canvas.

Absences:
Excused absences may be given only for verifiable medical or family emergencies. Written documentation must be provided to qualify for an excused absence. The medical documentation must specifically state that you could not attend class due to your illness and must be signed by a physician. A job or internship interview or any other appointment does not constitute an excused absence.

Grading Rubric:
The basic grading rubric is comprised of the four components listed below:

Note: The final grade will use the plus/minus grading system.

Late Submission Policy:
There is a 10% reduction in the grade per day. This applies to all project submissions throughout the term.

Tools:
- Zoom for online instruction.
- Google Cloud Platform for practice problems and project work.
- GitHub for code repository, version control, and how-to guides.
- Lucidchart for diagramming.
- Piazza for asynchronous communication (announcements, questions, discussions).
- Canvas for grade reporting.

Academic Integrity:
This course will abide by UTCS' code of academic integrity.

Students with Disabilities:
Students with disabilities may request appropriate academic accommodations.

Week-by-Week Schedule:
Below is a week-by-week schedule that includes the important milestones and assigned readings:

Acknowledgments:
This course is generously supported by Google by giving us access to their Cloud Platform.