Course Overview
Rapid advances in machine learning have enabled us to build systems with capabilities that were unimaginable just a few years ago, such as AlphaGo, which plays the board game Go better than any human player, DeepSeekCoder, which can generate programs in more than 300 computer languages, and AlphaEvolve, which can discover advanced algorithms. The goal of this course is to analyze the key breakthroughs that underlie these kinds of systems, and to understand how they can be used to build systems for solving other problems.
Lectures will cover three main machine learning technologies used in these systems: deep neural networks, reinforcement learning, and evolutionary computing. Unlike in machine learning courses, this material will be presented using PL/systems concepts such as dataflow analysis. Lectures will be complemented each week by student presentations of key papers in these areas from recent AI/ML conferences. Some of these papers go into greater depth in the core AI/ML technologies while others are case studies that analyze how they are deployed in systems for board games, multiplayer games, coding assistants, and algorithm discovery.
Prerequisites
- Proficiency in Python
- Familiarity with undergraduate-level algorithms and data structures
- Familiarity with calculus, statistics, and linear algebra, and strong mathematical skills
- Coursework or equivalent experience in AI and ML at the level of CS342 is strongly recommended. Practical experience with training machine learning models will be useful.
Coursework
- Class presentations 25%
- Class participation 10%
- 3-4 programming assignments 25%
- A substantial term project 40%
Academic Honesty
You may discuss concepts with classmates, but all written work and programming assignments must be your own or your project team's work when teamwork is permitted. You may not search online for existing implementations of algorithms related to the programming assignments, even as a reference. Students caught cheating will automatically fail the course and will be reported to the university. If in doubt about the ethics of any particular action, talk to the instructor or the TA.
Course Staff
Course Outcomes
By the end of this course, students will be able to:
- Build a solid foundation in the core technologies that underly modern AI/ML systems including deep neural networks, reinforcement learning, and evolutionary computing, while understanding their limitations.
- Learn how these technologies are deployed in modern AI/ML systems.
- Be able to evaluate and use advanced AI/ML technologies to build new systems.
Lecture Schedule
Below is the tentative schedule for the course. Note that dates and topics may change as the semester progresses.
Lecture | Date | Topic | Materials/Readings | Assignments & Deadlines |
---|---|---|---|---|
1 | 8/26 (T) | Introduction |
|
|
2 | 8/28 (Th) | Abstract NN & gradient computation |
|
Introduction post/survey due |
3 | 9/2 | DNNs, CNNs, RNNs, practical issues |
CNNs
|
|
4 | 9/4 | Sparsity in DNNs |
|
|
5 | 9/9 | Attention, Transformers, LLMs |
|
|
6 | 9/11 |
Presentations Training LLMs |
|
|
7 | 9/16 | Monte Carlo methods & variance reduction | Barto & Sutton – Ch 5 Monte Carlo Methods |
Assignment 1 due Train a model with a PEFT technique |
8 | 9/18 | MDPs | Barto & Sutton – Ch 3 MDPs | |
9 | 9/23 | Sampling (TD(0), TD(n), MC, Q-learning) | Barto & Sutton – Ch 6 Temporal Difference Learning | |
10 | 9/25 |
Presentations Planning (MCTS/AlphaGo) |
|
|
11 | 9/30 | Policy gradients (I): REINFORCE |
|
|
12 | 10/2 |
Presentations Double DQN, DQN, HER |
|
|
13 | 10/7 | Policy gradients (II): Baseline methods | TBD |
Assignment 2 due Grid-world (≥ 2 methods) or Pong (MLP + CNN) |
14 | 10/9 |
Presentations A2C/A3C & DDPG |
|
|
15 | 10/14 | Policy gradients (III): Trust-region methods |
|
Project ideas due (meeting required) |
16 | 10/16 |
Presentations Applications of Advanced Policy Gradient methods (DeepSeek/GRPO) |
|
|
17 | 10/21 | Reinforcement Learning from Human Feedback (RLHF) |
|
|
18 | 10/23 |
Presentations RLHF |
|
|
19 | 10/28 | Evolutionary Computation |
|
Assignment 3 due Implement an actor-critic method of choice |
20 | 10/30 |
Presentations Applications of Evolutionary Computation (AlphaEvolve, etc.) |
|
|
21 | 11/4 | Imitation Learning | Tutorial on Imitation Learning | Project check-in #1 (meeting required) |
22 | 11/6 |
Presentations Imitation Learning |
|
|
23 | 11/11 | Parallel/Distributed RL | Tutorial |
Assignment 4 due Evolve a neural network or implement a genetic algorithm |
24 | 11/13 |
Presentations Large-scale distributed RL |
||
25 | 11/18 | Project check-ins | ||
26 | 11/20 | Project check-ins | ||
THANKSGIVING BREAK | ||||
27 | 12/2 | Project presentations | ||
28 | 12/4 | Project presentations | Final project paper due |
Assignments
- Programming Assignment 1 (release TBA)
- Programming Assignment 2 (release TBA)
- Programming Assignment 3 (release TBA)
- Final Project (milestones TBA)
Resources
- Canvas
- Ed Discussion
- Reading List
- Reinforcement Learning: An Introduction (2nd edition) by Sutton & Barto
- Neural Networks and Learning Machines, Third Edition by Simon Haykin