CS394R: Reinforcement Learning: Theory and Practice -- Spring 2013: Assignments Page

Assignments for Reinforcement Learning: Theory and Practice

Week 1 (1/16): Class Overview, Introduction

Chapter 1 of the textbook

For each reading, be sure to submit a question or comment about the reading by 1pm on the day before class as an email in plain ascii text. I prefer that is be sent in the body of the email, rather than as an attachment. Please use the subject line "class readings for [due date]" and send to Peter and Sam (pstone@cs and sbarrett@cs). Please include your name in the response. And if you refer explicitly to the reading, please include page numbers. Details on expectations for reading responses are on the main class page. Example successful responses from a previous class are available on the sample responses page.

Week 2 (1/23): Evaluative Feedback and the RL Problem

Jump to the resources page.

Chapters 2 and 3 of the textbook (you may pay less attention to Sections 2.4, and 2.8-2.10)

Week 3 (1/30): Dynamic Programming and Monte Carlo Methods

Jump to the resources page.

Chapters 4 and 5 of the textbook

Week 4 (2/6): TD Learning and Eligibility Traces

Jump to the resources page.

Chapter 6 and 7 of the textbook (you may pay less attention to Sections 6.7 and 7.7)

Week 5 (2/13): Generalization and Planning

Jump to the resources page.

Chapters 8 and 9 of the textbook

Week 6 (2/20): Efficient model-based learning

Jump to the resources page.

R-Max - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
Ronen Brafman and Moshe Tenenholtz
The Journal of Machine Learning Research

The Adaptive k-Meteorologists Problem and Its Application to Structure Learning and Feature Selection in Reinforcement Learning Carlos Diuk, Lihong Li, and Bethany R. Leffler
ICML 2009

Model-Based Exploration in Continuous State Spaces
The Seventh Symposium on Abstraction, Reformulation, and Approximation, July 2007.
Nicholas K. Jong and Peter Stone

Week 7 (2/27): Bandits

Jump to the resources page.

Read sections 1, 2, 4, and 5 and the proof of Theorem 1 in Section 3. The proof of Theorem 3 and the appendices are optional.
UCB: Finite-time Analysis of the Multiarmed Bandit Problem
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer
2002

Read sections 1, 2, 3.1, 4, and 5. The details of the proof (Sections 3.2-3.4) are optional.
Thompson Sampling: an asymptotically optimal finite-time analysis
Emilie Kaufmann, Nathaniel Korda, and Remi Munos
2012

Read and understand the whole paper.
UCT: Bandit based Monte-Carlo Planning
Levente Kocsis and Csaba Szepesvari
2006

Class project proposal due at 9:00am on Wednesday. Please send an email with subject "Project Proposal" with a proposed topic for your class project. I anticipate projects taking one of two forms.

Practice (preferred): An implemenation of RL in some domain of your choice - ideally one that you are using for research or in some other class. In this case, please describe the domain and your initial plans on how you intend to implement learning. What will the states and actions be? What algorithm(s) do you expect will be most effective?

Theory: A proposal, implementation and testing of an algorithmic modification to an RL algorithm presented in the book. In this case, please describe the modification you propose to investigate and on what type of domain (possibly a toy domain) it is likely to show an improvement over things considered in the book.

Week 8 (3/6): Abstraction and Hierarchy

Jump to the resources page.

Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. (a PDF link)
Sutton, R.S., Precup, D., Singh, S.
Artificial Intelligence 112:181-211, 1999.

The MAXQ Method for Hierarchical Reinforcement Learning.
Thomas G. Dietterich
Proceedings of the 15th International Conference on Machine Learning, 1998.

Hierarchical Model-Based Reinforcement Learning: Rmax + MAXQ.
Proceedings of the 25th International Conference on Machine Learning, 2008.
Nicholas K. Jong and Peter Stone

Week 9 (3/20): Game Playing

Jump to the resources page.

Tesauro, G., Temporal Difference Learning and TD-Gammon . Communication of the ACM, 1995

Pollack, J.B., & Blair, A.D. Co-evolution in the successful learning of backgammon strategy. Machine Learning, 1998

Tesauro, G. Comments on Co-Evolution in the Successful Learning of Backgammon Strategy. Machine Learning, 1998.

S. Gelly and D. Silver. Achieving Master-Level Play in 9x9 Computer Go. In Proceedings of the 23rd Conference on Artificial Intelligence, Nectar Track (AAAI-08), 2008. Also available from here.

Simulation-Based Approach to General Game Playing
Hilmar Finnsson and Yngvi Bjornsson
AAAI 2008.

Week 10 (3/27): Financial Applications

Jump to the resources page.

Learning to trade via direct reinforcement
John Moody and Matthew Saffell
IEEE Transactions on Neural Networks, 2001.

Reinforcement learning for optimized trade execution
Yuriy Nevmyvaka, Yi Feng, and Michael Kearns
ICML 2006

Week 11 (4/3): Robotics

Jump to the resources page.

Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion.
Nate Kohl and Peter Stone
In Proceedings of the IEEE International Conference on Robotics and Automation, May 2004.

Autonomous helicopter flight via reinforcement learning.
Andrew Ng, H. Jin Kim, Michael Jordan and Shankar Sastry.
In S. Thrun, L. Saul, and B. Schoelkopf (Eds.), Advances in Neural Information Processing Systems (NIPS) 17, 2004.

Autonomous reinforcement learning on raw visual input data in a real world application.
Sascha Lange, Martin Riedmiller, Arne Voigtlander.
IJCNN 2012.

Week 12 (4/10): Practical RL and speedup methods

Jump to the resources page.

Characterizing Reinforcement Learning Methods through Parameterized Learning Problems
Shivaram Kalyanakrishnan and Peter Stone.
Machine Learning (MLJ), 84(1--2):205-47, July 2011.

An Introduction to Inter-task Transfer for Reinforcement Learning.
Matthew E. Taylor and Peter Stone.
AI Magazine, 32(1):15-34, 2011.

Week 13 (4/17): Health and Sustainability

Jump to the resources page.

Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning
Arthur Guez, Robert D. Vincent, Massimo Avoli, Joelle Pineau.
IAAI 2008

PAC Optimal Planning for Invasive Species Management: Improved Exploration for Reinforcement Learning from Simulator-Defined MDPs.
Thomas G. Dietterich, Majid Taleghan, and Mark Crowley
AAAI 2013.

Design, Analysis, and Learning Control of a Fully Actuated Micro Wind Turbine.
J. Zico Kolter, Zachary Jackowski, Russ Tedrake
American Controls Conference 2012.

Week 14 (4/24): Least squares methods

Jump to the resources page.

Technical update: Least-squares temporal difference learning Justin A. Boyan

Model-Free Least-Squares Policy Iteration Michail G. Lagoudakis and Ronald Parr Proceedings of NIPS*2001: Neural Information Processing Systems: Natural and Synthetic Vancouver, BC, December 2001, pp. 1547-1554.

Week 15 (5/1): Multiagent RL

Jump to the resources page.

Michael Littman, Markov Games as a Framework for Multi-Agent Reinforcement Learning, ICML, 1994.

Michael Bowling and Manuela Veloso
Rational and Convergent Learning in Stochastic Games
IJCAI 2001.

Rob Powers and Yoav Shoham
New Criteria and a New Algorithm for Learning in Multi-Agent Systems
NIPS 2004.
journal version

Final Project: due at 9:00am on Wednesday, 5/8

[Back to Department Homepage]

Page maintained by Peter Stone
Questions? Send me mail