CS329e Elements of Data Visualization

Instructor: Dr. Philip E. Cannata, cannata@cs.utexas.edu
Office hours: After class in the classroom or by appointment

TA: Jiacheng Zhuo, jzhuo@cs.utexas.edu

Class Times: TTh 5:00 - 6:30 p.m.
Class Location: PHR 2.110

Class Website: http://www.cs.utexas.edu/~cannata/dataVis/

Course Description:

This course teaches the essential and practical skills necessary to communicate information about data clearly and effectively through graphical means. Rendering data clearly and effectively with appropriate visual analytics reduces the time required to achieve understanding and helps in managing the ever growing amount of available digital data. Students learn to use the data.world platform and various software tools including Tableau, SQL, R, dplyr, ggplot, and Shiny.

Working with data.world is an obvious choice for this class, as the platform gives students the opportunity to collaborate around their specific projects, take advantage of the wealth of data readily available on the platform, and learn how to store their data in a space available to them long after their time with the University.

Tableau is one of the most popular commercial data visualization tools on the market today. Students learn how to use this tool to quickly analyze, visualize their data and share the information they discover on the Web.

R is the most popular free software environment for statistical computing and graphics. ggplot2 is a data visualization package for R that can be used to produce publication-quality graphics. In this course students learn how to use R and ggplot to not only produce production-quality graphics but also how to produce large multiplot images (i.e., dozens and dozens of different plots in one image) that can be used as a standardized form of analysis.

Shiny is a web application framework for R that allows students to turn their analysis into interactive web applications without writing HTML, CSS, or JavaScript.

data.world is the environment that ties all of this together via the flexible connectors and APIs. It also makes it possible for students to archive their data so that their analysis is available long after the semester ends. Students in follow-on semesters can quickly pick up the previous semester projects and carry them forward.

You will be expected to write programs using SQL and R in this class.

Prerequisites:

Required Text:

Top Hat Textbook - "Tableau and R Data Visualization" by Dr. Philip Cannata (ISBN: 978-1-77330-498-4) - Only available via enrollment on a Top Hat class.

Highly Recommended Texts:

Grading:

Plus and minus grades will not be used for final course grades.

Grades will be calculated as follows:

  • Attendance and Active Participation 75 points - 30 points for Attendance, and 45 points for Active Participation (Class Participation and Chapter Questions).
  • Top Hat will be used during class to take Attendance and to facilitate and grade Class Participation.
  • The extenuating Circumstances section below does not apply to your absences and class participation grade unless you have a well documented medical or family emergency, or religious obligation.
  • I have noticed over many semesters that class attendance is strongly correlated with getting a good grade in my classes. To encourage class attendance, the 30 points for Attendance will be assigned as follows:
Absences Attendance points
0 30
1 25
2 15
3 or more 0
  • Attendance will be taken 5-10 minutes before class and for 2 minutes after the class start time.
  • If you arrive after 2 minutes from the class start time, you will be marked absent.
  • If you are unsure if your attendance got recorded before class, please put your name on Dr. Cannata's attendance sheet at the front of the class before class begins.
  • There will be 8 quizzes over the course of the semester, each worth 10 points for a total of 80 points.
  • There will be 4 lab projects over the course of the semester, each worth 20 points for a total of 80 points. Students will work in groups of 3 or 4 on these lab projects. Groups will be expected to complete the lab projects on their own using the provided code when given. Class examples and examples from the web can be used, but these must be documented. To be clear, each group must do a significant amount of the work on the lab projects by themselves and each group member must contribute equally to the group's work.
    • Requirements Documents will be given for each Project.
    • There will be no grading rubric given for the projects until after the projects are graded.
    • A major component of the grade for each project will be based upon "finding interesting things about your data".
    • Another major component of the grade for each member of a project group will be based upon documenting your work as data.world "insights" each day over the course of time between when the project is assigned and when it is due. In other words, procrastination will be heavily penalized on project grades on an individual group member basis. This means that each member of a project group could get a different grade for a project based upon his or her effort on the project.
    • Everything you are taught in this class at some point in time must be learned, retained, and used on subsequent projects. In other words, you can't learn something, use it in a project, and then forget it.
  • Each group will have to complete a separate final project, which will be worth 40 points.

Grading will be on a straight scale as follows:

            A =      100 - 90%    (248-275 pts)
            B =      89 - 80%      (220-247 pts)
            C =      79 - 70%      (193-219 pts)
            D =      69 - 60%      (165-192 pts)
            F =      < 60%         (below 165 pts)

Active Class Participation:

  • If you miss something in class, you need to ask questions right then.
  • You should practice what I teach in class as soon as possible after class and then if you have problems, stay after class and/or post on piazza.
  • If you miss a class, it is your responsibility to catch up as quickly as possible.
  • Procrastination is a killer in this class
  • You should practice "active listening" during class. This is where you make a conscious effort to hear not only the words that another person is saying but, more importantly, try to understand the complete message being sent. In order to do this:
    • You must prepare yourself for the conversation, in the case of this class, you must complete the reading assignments and answer the Chapter Questions on or before the due date.
    • You must pay attention to the other person very carefully.
    • You must not allow yourself to become distracted by whatever else may be going on around you, or by forming counter arguments that you'll make when the other person stops speaking.
    • You must not allow yourself to get bored, and lose focus on what the other person is saying.

All of these contribute to good listening and understanding.

Extenuating Circumstances:

If you encounter an unexpected medical or family emergency, a random act of Nature and/or have difficulty meeting the requirements of this course, fail to complete a project, and/or miss a quiz because of extenuating circumstances, please advise Dr. Cannata in writing (not email) during the week of Final Project presentations so that special consideration MIGHT be given. A file of all written correspondence will be kept by Dr. Cannata in a and decisions regarding them will be made at the end of the semester after the initial final grades have been calculated.

Please note: the University does not consider a job interview as a valid reason for missing class.

Students with disabilities:

Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities, 512-471-6259, http://diversity.utexas.edu/disability/

Course Topics:

Data Exploration Methodology, Introduction to:

  • Tableau Public

  • R, RStudio, and Interactive Documents

  • data.world

Exploring Data with Boxplots, plus:

  • Tableau Pages

  • Tableau Date Levels

 

Exploring Data with Histograms, plus:

  • Tableau

    • Analytics Tab

    • Formatting

    • Changing the Aggregate Function in Tableau

    • Dual-Axis Plots

  • R

    • The tidyverse dplyr Package

    • Formatting

    • Dual-Axis Plots

 

Exploring Data with Bar Charts, plus:

  • Tableau

    • Table Calculations

    • Sets

    • Packed Bubble Charts

    • Treemap Charts

  • R

    • dplyr::mutate() function

    • Table Calculations

 

Exploring Data with Scatter Plots, plus:

  • Tableau

    • Maps with Value Corrections

    • Actions

    • Dashboards

    • ANOVA Models

    • Stories

  • R

    • Leaflet Maps

    • Choropleth Maps

Spring Break

 

Exploring Data with Crosstabs, plus:

  • Tableau

    • Calculated Fields

    • Parameters

    • KPIs

  • R

    • SQL Case Statement

    • KPIs

 

Advanced Topics

  • Level of Detail Calculations

  • Statistical Learning

  • More Charts