This course teaches essential decision making skills for scientists and business people using rapidlygrowing technologies known as predictive analytics, data mining and machine learning. These technologies focus on finding patterns in large data sets that might otherwise go undetected and then using these patterns for modeling and prediction. Predictive analytics, data mining and machine learning are especially important today because of advances in computational power and an explosion in the quantity of available data. Students will become familiar with data.world, statistical learning packages in R and similar capabilities in the python ScikitLearn package. Time permitting, students will also be introduced to "Deep Learning" using the python Tensorflow package.
Working with data.world gives students the opportunity to collaborate around their specific projects, take advantage of the wealth of data readily available on the data.world platform, and learn how to store their data in a space available to them long after their time with the University.
R is the most popular free software environment for statistical computing. R supports all of the statistical learning methods taught in this course, plus it is the environment of choice for research statisticians. This means that it is at the cutting edge with respect to new statistical learning methods.
The python scikitlearn package is also a popular free software environment for machine learning, it provides a single, clean and consistent interface to many different statistical methods. It provides many options for each method, but also trys to choose sensible defaults. It also tries to help users understand the models as well as how to use them properly. And, like R, it is being actively developed by the python machine learning community.
Deep learning libraries like Google Tensorflow are also now gaining in popularity. "Deep Learning" is mainly concerned with training and using large neural networks. Neural networks are loosely patterned after the operation of neurons in the human brain.
Required Texts:
Recommended Texts:
Plus and minus grades will not be used for final course grades.
Grades will be calculated as follows:
Absences Attendance points 0 30 1 25 2 15 3 or more 0
Grading will be on a straight scale as follows:
A = 100  90% (248275 pts)
B = 89  80% (220247 pts)
C = 79  70% (193219 pts)
D = 69  60% (165192 pts)
F = < 60% (below 165 pts)
If you miss something in class, you need to ask questions right then. You should practice what I teach in class as soon as possible after class and then if you have problems, come to office hours, stay after class and/or post on pizza. If you miss a class, it is your responsibility to catch up as quickly as possible. Procrastination is a killer in this class.
The way to improve your listening skills is to practice "active listening." This is where you make a conscious effort to hear not only the words that another person is saying but, more importantly, try to understand the complete message being sent.
In order to do this you must pay attention to the other person very carefully.
You cannot allow yourself to become distracted by whatever else may be going on around you, or by forming counter arguments that you'll make when the other person stops speaking. Nor can you allow yourself to get bored, and lose focus on what the other person is saying. All of these contribute to a lack of listening and understanding.
If you encounter an unexpected medical or family emergency, a random act of Nature and/or have difficulty meeting the requirements of this course, fail to complete a project, and/or miss a quiz because of extenuating circumstances, please advise your Dr. Cannata in writing (not email) during the week of Final Project presentations so that special consideration may be given. A file of all written correspondence will be kept by the Dr. Cannata in a and decisions regarding them will be made at the end of the semester after the initial final grades have been calculated.
Please note: the University does not consider a job interview as a valid reason for missing class.
Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities, 5124716259, http://diversity.utexas.edu/disability/
Week  Subject  Readings (best done prior to class)  Projects and Quizzes 
1  Class introduction and getting started with Statistical Learning  Statistical Learning  Chapter 1; Chapter 1 Videos  
2  Introduction to Statistical Learning  Statistical Learning  Chapter 2; Chapter 2 Videos  
3  Linear Regression  Statistical Learning  Chapter 3; Chapter 3 Videos R for Data Science  Chapter 18 (Model Basics with modelr), pages 345 (Introduction)  page 358 (end of Residuals). 

4  Residual Analysis and Dichotomous Classification using Logistic Regression  R for Data Science  Chapter 19 (Model Building), pages 375 (Introduction)  page 384 (end of "A More Complicated Model"). Statistical Learning  Chapter 4 (pages 127  137), and the first 3 Chapter 4 Videos and the Lab: Logistic Regression video. 
Project 1 
5  Classification using Linear Discriminant Analysis  Statistical Learning   Chapter 4; Chapter 4 Videos  Quiz 1 
6  The tidyverse dplyr package, ROC Curves, Other forms of Discriminant Analysis, Model Comparison Communicate with R Markdown Interactive Documents 
"Data Transformation Cheat Sheet at RStudio Cheat Sheets ROC Curve, Model comparison  Statistical Learning Chapter 4 pages 147  154; Statistical Learning   Chapter 4, and Chapter 4 Videos R Markdown, R Markdown Cheat Sheet at this link 
Quiz 2 
7  Resampling Methods  Statistical Learning  Chapter 5; Chapter 5 Videos  Project 2 
8  Linear Model Selection and Regularization  Statistical Learning  Chapter 6 (Subset Selection and Shrinkage Methods); Chapter 6 Videos (Subset Selection and Shrinkage Methods) Videos  Quiz 3 
9  SQL and Joining Data Tables with Census Data Tables; Principal Components Analysis (PCA)  Statistical Learning  Chapters 6 (PCA) and 10 (PCA); Chapters 6 (PCA) and 10 (PCA) ; ScikitLearn Chapters 8  Project 3 
10  TreeBased Methods, and Moving Beyond Linearity  Statistical Learning  Chapters 8 and 7; Chapters 8 and 7 Videos; Scikit Learn Chapters 6, and 7  Quiz 4 
11  Support Vector Machines and Unsupervised Learning  Statistical Learning  Chapters 9 and 10; Chapters 9 and 10 Videos; Scikit Learn Chapter 5  Quiz 5 
12  Introduction to Neural Networks, Deep Learning, and TensorFlow  SciKit Part II  Chapters 9 and 10  Project 4 
13  Statistical Learning Recap; Training DNNs;  SciKit Part II  Chapter 11  Quiz 6 
14  Convolutional Neaural Networks (CNNs)  SciKit Part II  Chapter 13  Quiz 7 
15  Final Project Reviews  
Selected Project Presentations and Wrapup  Quiz 8 