CS394R: Reinforcement Learning: Theory and Practice -- Fall 2016: Assignments Page

Assignments for Reinforcement Learning: Theory and Practice

Things to do ASAP (before the first class if possible)

Join the class discussion group (see class main page).

Week 0 (8/25): Class Overview

If you would like to get a jump on the class, read the following:

Chapters 1 of the course textbook (2nd edition)

Sign up to lead a discussion. We will try to have at most one person per day unless the class size is large, so fill slot 1s first.

Week 1 (8/30): Introduction and Evaluative Feedback

Jump to the resources page.

Chapter 1 (until the end of Section 1.6) of the textbook

Introduction to Part I (just one page)

Chapter 2 (derivation in Section 2.7 is optional)

Do your first programming assignment (by Thursday)

For each week, be sure to submit a question or comment about each reading by 5pm on Monday as an email in plain ascii text. Please send it in the body of the email, rather than as an attachment. Please use the subject line "class readings for [due date]" and send to Peter and Sanmit (pstone@cs and sanmit@cs). Please include your name in the response. And if you refer explicitly to the reading, please include page numbers. Details on expectations for reading responses are on the main class page. Example successful responses from a previous class are available on the sample responses page.

Week 2 (9/6): MDPs and Dynamic Programming

Jump to the resources page.

Chapters 3 and 4 of the textbook (2nd edition)

Week 3 (9/13): Monte Carlo Methods and TD Learning

Jump to the resources page.

Chapters 5 and 6 of the textbook

Week 4 (9/20): Multi-Step Bootstrapping and Planning

Jump to the resources page.

Chapters 7 and 8 of the textbook

Week 5 (9/27): Approximate On-policy Prediction and Control

Jump to the resources page.

Chapters 9 and 10 of the textbook

Week 6 (10/4): Approximate Off-policy Methods and Eligibility Traces

Jump to the resources page.

Chapters 11 and 12 of the textbook.
NOTE: These are still incomplete drafts, so please excuse any inconsistencies.
Also, make sure you grab the "2015sep.pdf" version of the book from the class homepage.

Week 7 (10/11): Applications and Case Studies

Jump to the resources page.

Chapter 16 of the textbook.
NOTE: This is still an incomplete draft, so please excuse any inconsistencies.
Also, make sure you grab the "2016sep.pdf" version of the book from the class homepage.

Class project proposal due at 11:59pm on Thursday. Please send an email (to the instructor and TA) with subject "Project Proposal" with a proposed topic for your class project. I anticipate projects taking one of two forms.

Practice (preferred): An implemenation of RL in some domain of your choice - ideally one that you are using for research or in some other class. In this case, please describe the domain and your initial plans on how you intend to implement learning. What will the states and actions be? What algorithm(s) do you expect will be most effective?

Theory: A proposal, implementation and testing of an algorithmic modification to an RL algorithm presented in the book. In this case, please describe the modification you propose to investigate and on what type of domain (possibly a toy domain) it is likely to show an improvement over things considered in the book.

See the project page for full details on the project.

Week 8 (10/18): Efficient Model-Based Exploration

Jump to the resources page.

R-Max - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
Ronen Brafman and Moshe Tenenholtz
The Journal of Machine Learning Research (JMLR) 2002

An Analysis of Model-Based Interval Estimation for Markov Decision Processes
Alexander L. Strehl and Michael L. Littman
MLJ 2008.

Model-Based Exploration in Continuous State Spaces
Nicholas K. Jong and Peter Stone
The Seventh Symposium on Abstraction, Reformulation, and Approximation, July 2007.

Week 9 (10/25): Abstraction: Options and Hierarchy

Jump to the resources page.

Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. (a PDF link)
Sutton, R.S., Precup, D., Singh, S.
Artificial Intelligence 112:181-211, 1999.

The MAXQ Method for Hierarchical Reinforcement Learning.
Thomas G. Dietterich
Proceedings of the 15th International Conference on Machine Learning, 1998.

Hierarchical Model-Based Reinforcement Learning: Rmax + MAXQ.
Proceedings of the 25th International Conference on Machine Learning, 2008.
Nicholas K. Jong and Peter Stone

Week 10 (11/1): Multiagent RL

Jump to the resources page.

Michael Littman, Markov Games as a Framework for Multi-Agent Reinforcement Learning, ICML, 1994.

Michael Bowling and Manuela Veloso
Rational and Convergent Learning in Stochastic Games
IJCAI 2001.

Doran Chakraborty and Peter Stone
Convergence, Targeted Optimality and Safety in Multiagent Learning
ICML 2010.
journal version

Week 11 (11/8): Policy Gradient Methods

Jump to the resources page.

Chapter 13 of the textbook.
NOTE: This is still an incomplete draft, so please excuse any inconsistencies.
Also, make sure you grab the "2016sep.pdf" version of the book from the class homepage.

Overview of Policy Gradient Methods by Jan Peters: http://www.scholarpedia.org/article/Policy_gradient_methods

Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion.
Nate Kohl and Peter Stone
In Proceedings of the IEEE International Conference on Robotics and Automation, May 2004.

Guided policy search
Sergey Levine and Vladlen Koltun.
ICML 2013.
associated videos

Project literature review due at 11:59pm on Thursday. Please send an email (to the instructor and TA) with subject "Project literature review" with a proposed topic for your class project.
See the project page for full details on the project.

Week 12 (11/15): Inverse RL and Transfer Learning

Jump to the resources page.

Apprenticeship Learning via Inverse Reinforcement Learning
Pieter Abbeel and Andrew Ng
ICML 2004.

An Introduction to Inter-task Transfer for Reinforcement Learning.
Matthew E. Taylor and Peter Stone.
AI Magazine, 32(1):15-34, 2011.

Week 13 (11/22): Deep RL

Jump to the resources page.

Action-Conditional Video Prediction Using Deep Networks in ATARI Games.
Juhnyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, and Satinder Singh.
Neural Information Processing Systems, 2015.
Appendix
Videos

Week 14 (11/29): Project Demos

Jump to the resources page.

Final Project: due at 9:30am on Thursday, 12/8

[Back to Department Homepage]

Page maintained by Peter Stone
Questions? Send me mail