Class Times: Fridays 4:00pm - 7:00pm
Class Location: Zoom
Instructor: Shirley Cohen
Email: scohen at cs dot utexas dot edu
Office hours: Mondays from 7:00pm - 8:00pm on Zoom
TA: Karan Sadananda Karnad
Email: karan dot karnad at utexas dot edu
Office hours: Wednesdays and Thursdays 9:30am - 11:00am on Zoom
TA: Zhaosong Zhu
Email: zhaosong dot zhu at utexas dot edu
Office hours: Tuesdays from 2:30pm - 4:00pm and Thursdays from 4:00pm - 5:30pm on Zoom
For all Zoom links (class meetings and office hours), please see Canvas.
This course is designed to give students a practical understanding of databases and data systems. The goal is to learn modern data management and data processing techniques through a mix of best practices, experimentation, and problem solving.
The content of the course is organized into three broad areas: 1) query languages with an emphasis on SQL; 2) data models from relational to document to graph; and 3) data engineering, including processing data at-scale.
We will construct several databases ranging from transactional to analytical and everything in between. This work will be implemented on Google Cloud Platform using a variety of databases and data science tools: MySQL, Postgres, Spanner, Firestore, MongoDB, Neo4j, BigQuery, Apache Beam and Dataflow, Jupyter Notebooks, and Data Studio.
Below are some of the topics we will be covering during the term:
- inserts, updates, deletes
- data visualization
The course assumes a programming background and in particular, a solid working knowledge of Python scripting. As such, the prerequisites for this course are CS 303E, CS 307 or the equivalent. Familiarity with SQL is also helpful, but not required.
There are two required texts for this course:
- Josephine Bush, Learn SQL Database Programming, First Edition, 2020.
- Aaron Ploetz et. al, Seven NoSQL Databases in a Week, First Edition, 2018.
The course requires consulting the product documentation on Cloud SQL, Cloud Spanner, Firestore, BigQuery, Apache Beam, Dataflow, and Data Studio. The documentation is updated regularly and will be read frequently throughout the semester.
The most important component of this course are the projects. The projects are intended to give you hands-on experience with the database systems and tools. They will start with the basic data management operations and move onto to advanced query capabilities. Projects will be carried out in groups of two students. You will form groups at the start of the term and work with the same partner throughout the term. More details on the projects will be provided in the week-by-week section below.
There will be 3 midterms and no final exam. The tests are comprehensive and will cover all the material to-date, including readings, projects, and lectures. They will be open-book and done anychronously. Unfortunately, no make-up tests will be offered due to our limited resources.
We will be holding synchronous class meetings so that you have the opportunity to ask questions and work together with other students. My goal is to spend the majority of class time clarifying difficult concepts and actively working through problems rather than delivering a traditional lecture. You will need to have a stable internet connection and a laptop or desktop computer so that you can participate.
There will be two types of participation questions: Ones based on class exercises which you will work on in pairs and answer with UT Instapoll via Canvas. The other type of question will be using the Socratic method where I will ask individual students a series of questions about the material. All students will get called on at least once during the term.
Excused absences may be given only for verifiable medical or family emergencies. Written documentation must be provided to qualify for an excused absence. The medical documentation must specifically state that you could not attend class due to your illness and must be signed by a physician. A job or internship interview or any other appointment does not constitute an excused absence.
The basic grading rubric is comprised of the four components listed below: