CS395T · Fall 2025

Lecture Schedule

Below is the tentative schedule for the course. Note that dates and topics may change as the semester progresses.

The schedule can be found in Excel format here for paper presentation and review sign-ups.

Assignment 3 handed out

Lecture	Date	Topic	Materials/Readings	Assignments & Deadlines
1	8/26 (T)	Introduction (Slides)	AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery (Novikov et al., 2025) DeepSeek-V3 Technical Report (2024) AlphaGo
2	8/28 (Th)	Abstract NN & gradient computation (Slides)	Haykin Textbook - Introduction; Chapter 1: Perceptron; Chapter 4: Multi-layer Perceptrons (MLPs) Who invented backpropagation? (Schmidhuber 2014) Tutorial on Autograd	Introduction post/survey due
3	9/2 (T)	DNNs, CNNs, RNNs, practical issues (Slides)	Deep Residual Learning for Image Recognition (ResNet) (Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, 2015) Haykin Textbook - Chapter 15: RNNs Neural Machine Translation by Jointly Learning to Align & Translate (Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2014)	Presentation sign-up due (one paper presentation, one paper review)
4	9/4 (Th)	Attention, Transformers, LLMs (Slides)	Attention Is All You Need (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, 2017) Understanding and Coding Self-Attention (Sebastian Raschka) How to Scale Your Model	Assignment 1 handed out
5	9/9 (T)	Presentations: Optimizing Attention (Slides)	FlashAttention (Ma et al., 2022) FlashAttention-3 (Shah et al., 2024)
6	9/11 (Th)	Monte Carlo methods & variance reduction (Slides)	Barto & Sutton – Ch 5 Monte Carlo Methods
7	9/16 (T)	Presentations: Optimizing LLMs	PagedAttention (Kwon et al., 2023) (Slides) DeepSeek-V3 Technical Report (2024) (Slides)
8	9/18 (Th)	Markov Decision Processes (MDPs) (Slides)	Recording Barto & Sutton – Ch 3 MDPs	Assignment 1 due
9	9/23 (T)	Sampling (TD(0), TD(n), Q-learning) (Slides)	Barto & Sutton – Ch 6 Temporal Difference Learning
10	9/25 (Th)	Sampling II (MC) (Slides)	Barto & Sutton – Ch 5 Monte Carlo Methods
11	9/30 (T)	Presentations: Deep Q-Networks (DQN), Hindsight Experience Replay	Human-level Control through Deep Reinforcement Learning (Mnih et al., 2015) (Slides) Hindsight Experience Replay (Andrychowicz et al., 2017) (Slides)	Project instructions released
12	10/2 (Th)	Policy gradients (I): REINFORCE (Slides)	Barto & Sutton – Ch 13: Policy Gradients OpenAI DRL – Intro to Policy Optimization Definitive Guide to Policy Gradients §1–2 (Matthias Lehmann, 2024) Deep RL: Pong from Pixels (Andrej Karpathy) Karpathy - Overview of Policy Gradients Video
13	10/7 (T)	Presentations: RL Environments	ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning (Marcin Miłkowski, 2016) (Slides) Gymnasium (Slides)
14	10/9 (Th)	Policy gradients (II): Baseline methods (Slides)		Assignment 2 handed out
15	10/14 (T)	Presentations: Actor-Critics & DDPG	Asynchronous Methods for Deep Reinforcement Learning (Mnih et al., 2016) (Slides) Continuous Control with Deep Reinforcement Learning (DDPG) (Lillicrap et al., 2015) (Slides)	Project Proposal due (meeting required)
16	10/16 (Th)	Policy gradients (III): Trust-region methods (Slides)	Definitive Guide to Policy Gradients §3-4 (Matthias Lehmann, 2024) Trust Region Policy Optimization (TRPO) (John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, Philipp Moritz, 2015) Proximal Policy Optimization (PPO) (John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017)
17	10/21 (T)	Presentations: Policy Optimization Methods	DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Shao et al., 2024) (Slides) Simple Policy Optimization (Xie et al., 2025) (Slides)
18	10/23 (Th)	Reinforcement Learning from Human Feedback (RLHF) and Imitation Learning (Slides)	Reinforcement Learning from Human Feedback: Progress and Challenges (John Schulman, 2023) Training language models to follow instructions with human feedback (Long Ouyang et al., 2022) Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback (Jan Brauner et al., 2023) Tutorial on Imitation Learning	Assignment 2 due
19	10/28 (T)	Presentations: RLHF	Deep Reinforcement Learning from Human Preferences (Paul F. Christiano et al., 2017) RLTF: Reinforcement Learning from Unit Test Feedback (Liu et al., 2023)
20	10/30 (Th)	Presentations: Imitation Learning	Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations (Rajeswaran et al., 2017) Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization (Finn et al. 2016)
21	11/4 (T)	Evolutionary Computation (Slides)	Neuroevolution (Risto Miikkulainen, 2022) Evolving neural networks through augmenting topologies (Kenneth O. Stanley, Risto Miikkulainen, 2002) Evolutionary Policy Optimization (Lain Mustafaoglu, Keshav Pingali, Risto Miikkulainen, 2025)
22	11/6 (Th)	Presentations: Applications of Evolutionary Computation	Evolution Strategies as a Scalable Alternative to Reinforcement Learning (Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, Ilya Sutskever, 2017) Accelerating Evolution Through Gene Masking and Distributed Search (Hormoz Shahrzad, Risto Miikkulainen, 2023) AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery (Novikov et al., 2025)
23	11/11 (T)	Presentations: AI-Driven Research for Systems	Barbarians at the Gate: How AI is Upending Systems Research (Cheng et al., 2025)
24	11/13 (Th)	Presentations: ML for Systems (II)	The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition (Lange et al., 2025) A Survey of Ad Hoc Teamwork Research (Mirsky et al., 2022)	Project check-in (meeting required (11/13, 11/15))
25	11/18 (T)	Presentations: Large-scale distributed RL	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures (Espeholt et al., 2018) SEED RL: Scalable & Efficient Deep-RL (Espeholt et al. 2019) Recurrent Experience Replay in Distributed Reinforcement Learning (ICLR 2019)
26	11/20 (Th)	Presentations: Other RL Topics	The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games (Yu et al., 2022) Decision Transformer: Reinforcement Learning via Sequence Modeling (Chen et al., 2021)
Assignment 3 due (November 23 2025)
THANKSGIVING BREAK
27	12/2 (T)	Project presentations
28	12/4 (Th)	Project presentations		Final project paper due (no extensions will be allowed, no exceptions)