Syllabus for CS 327E Elements of Databases - Fall 2020

Class Times: Fridays 4:00pm - 7:00pm
Class Location: Zoom

Instructor: Shirley Cohen
Email: scohen at cs dot utexas dot edu
Office hours: After class or by appointment on Zoom

TA: Zongying Mo
Email: zm3998 at utexas dot edu
Office hours: Monday and Wednesday 3:30pm - 5:00pm on Zoom

TA: Yin Deng
Email: ydeng at cs dot utexas dot edu
Office hours: Tuesday and Thursday 5:00pm - 6:00pm and Saturday 2:00pm - 3:00pm on Zoom

For all Zoom links (class meetings and office hours), please see Canvas.

Course Description:
This course is designed to give students a practical understanding of databases and data systems. The goal is to learn modern data management and data processing techniques through a mix of best practices, experimentation, and problem solving.

The content of the course is organized into three broad areas: 1) query languages with an emphasis on SQL; 2) data models from relational to document to graph; and 3) data engineering, including processing data at-scale.

We will construct several databases ranging from transactional to analytical and everything in between. This work will be implemented on Google Cloud Platform using a variety of databases and data science tools: MySQL, Postgres, Spanner, Firestore, MongoDB, Neo4j, BigQuery, Apache Beam and Dataflow, Jupyter Notebooks, and Data Studio.

Below are some of the topics we will be covering during the term:

SQL:
- select-from-where
- order-bys
- joins
- inserts, updates, deletes
- aggregates
- group-bys
- subqueries

Data Models:
- relational
- document
- graph
- nested

Data Engineering:
- ingestion
- transformations
- enrichment
- data visualization

Prerequisites:
The course assumes a programming background and in particular, a solid working knowledge of Python scripting. As such, the prerequisites for this course are CS 303E, CS 307 or the equivalent. Familiarity with SQL is also helpful, but not required.

Textbooks:
There are two required texts for this course:
- Josephine Bush, Learn SQL Database Programming, First Edition, 2020.
- Aaron Ploetz et. al, Seven NoSQL Databases in a Week, First Edition, 2018.

Supplemental Readings:
The course requires consulting the product documentation on Cloud SQL, Cloud Spanner, Firestore, BigQuery, Apache Beam, Dataflow, and Data Studio. The documentation is updated regularly and will be read frequently throughout the semester.

Projects:
The most important component of this course are the projects. The projects are intended to give you hands-on experience with the database systems and tools. They will start with the basic data management operations and move onto to advanced query capabilities. Projects will be carried out in groups of two students. You will form groups at the start of the term and work with the same partner throughout the term. More details on the projects will be provided in the week-by-week section below.

Tests:
There will be 3 midterms and no final exam. The tests are comprehensive and will cover all the material to-date, including readings, projects, and lectures. They will be open-book and done anychronously. Unfortunately, no make-up tests will be offered due to our limited resources.

Participation:
We will be holding synchronous class meetings so that you have the opportunity to ask questions and work together with other students. My goal is to spend the majority of class time clarifying difficult concepts and actively working through problems rather than delivering a traditional lecture. You will need to have a stable internet connection and a laptop or desktop computer so that you can participate.

There will be two types of participation questions: Ones based on class exercises which you will work on in pairs and answer with UT Instapoll via Canvas. The other type of question will be using the Socratic method where I will ask individual students a series of questions about the material. All students will get called on at least once during the term.

Absences
Excused absences may be given only for verifiable medical or family emergencies. Written documentation must be provided to qualify for an excused absence. The medical documentation must specifically state that you could not attend class due to your illness and must be signed by a physician. A job or internship interview or any other appointment does not constitute an excused absence.

Grading Rubric:
The basic grading rubric is comprised of the four components listed below:

Note: The final grade will use the plus/minus grading system.

Late Submission Policy:
There is a 10% reduction in the grade per day. This applies to all project submissions throughout the term.

Tools:
- Zoom for online instruction.
- Google Cloud Platform for practice problems and project work.
- GitHub for code repository, version control, and how-to guides.
- Lucidchart for diagramming.
- Piazza for asynchronous communication (announcements, questions, discussions).
- Canvas for grade reporting.

Academic Integrity:
This course will abide by UTCS' code of academic integrity.

Students with Disabilities:
Students with disabilities may request appropriate academic accommodations.

Week-by-Week Schedule:
Below is a week-by-week schedule that includes the important milestones and assigned readings:
Acknowledgments:
This course is generously supported by Google by giving us access to their Cloud Platform.