CS394R: Reinforcement Learning: Theory and Practice -- Spring 2013: Assignments Page

Assignments for Reinforcement Learning: Theory and Practice


Week 1 (1/16): Class Overview, Introduction

Jump to the resources page.

  • Chapter 1 of the textbook
  • For each reading, be sure to submit a question or comment about the reading by 1pm on the day before class as an email in plain ascii text. I prefer that is be sent in the body of the email, rather than as an attachment. Please use the subject line "class readings for [due date]" and send to Peter and Sam (pstone@cs and sbarrett@cs). Please include your name in the response. And if you refer explicitly to the reading, please include page numbers. Details on expectations for reading responses are on the main class page. Example successful responses from a previous class are available on the sample responses page.

  • Week 2 (1/23): Evaluative Feedback and the RL Problem

    Jump to the resources page.

  • Chapters 2 and 3 of the textbook (you may pay less attention to Sections 2.4, and 2.8-2.10)

  • Week 3 (1/30): Dynamic Programming and Monte Carlo Methods

    Jump to the resources page.

  • Chapters 4 and 5 of the textbook

  • Week 4 (2/6): TD Learning and Eligibility Traces

    Jump to the resources page.

  • Chapter 6 and 7 of the textbook (you may pay less attention to Sections 6.7 and 7.7)

  • Week 5 (2/13): Generalization and Planning

    Jump to the resources page.

  • Chapters 8 and 9 of the textbook

  • Week 6 (2/20): Efficient model-based learning

    Jump to the resources page.

  • R-Max - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning
    Ronen Brafman and Moshe Tenenholtz
    The Journal of Machine Learning Research
  • The Adaptive k-Meteorologists Problem and Its Application to Structure Learning and Feature Selection in Reinforcement Learning Carlos Diuk, Lihong Li, and Bethany R. Leffler
    ICML 2009
  • Model-Based Exploration in Continuous State Spaces
    The Seventh Symposium on Abstraction, Reformulation, and Approximation, July 2007.
    Nicholas K. Jong and Peter Stone

  • Week 7 (2/27): Bandits

    Jump to the resources page.

  • Read sections 1, 2, 4, and 5 and the proof of Theorem 1 in Section 3. The proof of Theorem 3 and the appendices are optional.
    UCB: Finite-time Analysis of the Multiarmed Bandit Problem
    Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer
    2002
  • Read sections 1, 2, 3.1, 4, and 5. The details of the proof (Sections 3.2-3.4) are optional.
    Thompson Sampling: an asymptotically optimal finite-time analysis
    Emilie Kaufmann, Nathaniel Korda, and Remi Munos
    2012
  • Read and understand the whole paper.
    UCT: Bandit based Monte-Carlo Planning
    Levente Kocsis and Csaba Szepesvari
    2006
  • Class project proposal due at 9:00am on Wednesday. Please send an email with subject "Project Proposal" with a proposed topic for your class project. I anticipate projects taking one of two forms.
  • Practice (preferred): An implemenation of RL in some domain of your choice - ideally one that you are using for research or in some other class. In this case, please describe the domain and your initial plans on how you intend to implement learning. What will the states and actions be? What algorithm(s) do you expect will be most effective?
  • Theory: A proposal, implementation and testing of an algorithmic modification to an RL algorithm presented in the book. In this case, please describe the modification you propose to investigate and on what type of domain (possibly a toy domain) it is likely to show an improvement over things considered in the book.

  • Week 8 (3/6): Abstraction and Hierarchy

    Jump to the resources page.

  • Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. (a PDF link)
    Sutton, R.S., Precup, D., Singh, S.
    Artificial Intelligence 112:181-211, 1999.
  • The MAXQ Method for Hierarchical Reinforcement Learning.
    Thomas G. Dietterich
    Proceedings of the 15th International Conference on Machine Learning, 1998.
  • Hierarchical Model-Based Reinforcement Learning: Rmax + MAXQ.
    Proceedings of the 25th International Conference on Machine Learning, 2008.
    Nicholas K. Jong and Peter Stone

  • Week 9 (3/20): Game Playing

    Jump to the resources page.

  • Tesauro, G., Temporal Difference Learning and TD-Gammon. Communication of the ACM, 1995
  • Pollack, J.B., & Blair, A.D. Co-evolution in the successful learning of backgammon strategy. Machine Learning, 1998
  • Tesauro, G. Comments on Co-Evolution in the Successful Learning of Backgammon Strategy. Machine Learning, 1998.
  • S. Gelly and D. Silver. Achieving Master-Level Play in 9x9 Computer Go. In Proceedings of the 23rd Conference on Artificial Intelligence, Nectar Track (AAAI-08), 2008. Also available from here.
  • Simulation-Based Approach to General Game Playing
    Hilmar Finnsson and Yngvi Bjornsson
    AAAI 2008.

  • Week 10 (3/27): Financial Applications

    Jump to the resources page.

  • Learning to trade via direct reinforcement
    John Moody and Matthew Saffell
    IEEE Transactions on Neural Networks, 2001.
  • Reinforcement learning for optimized trade execution
    Yuriy Nevmyvaka, Yi Feng, and Michael Kearns
    ICML 2006

  • Week 11 (4/3): Robotics

    Jump to the resources page.

  • Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion.
    Nate Kohl and Peter Stone
    In Proceedings of the IEEE International Conference on Robotics and Automation, May 2004.
  • Autonomous helicopter flight via reinforcement learning.
    Andrew Ng, H. Jin Kim, Michael Jordan and Shankar Sastry.
    In S. Thrun, L. Saul, and B. Schoelkopf (Eds.), Advances in Neural Information Processing Systems (NIPS) 17, 2004.
  • Autonomous reinforcement learning on raw visual input data in a real world application.
    Sascha Lange, Martin Riedmiller, Arne Voigtlander.
    IJCNN 2012.

  • Week 12 (4/10): Practical RL and speedup methods

    Jump to the resources page.

  • Characterizing Reinforcement Learning Methods through Parameterized Learning Problems
    Shivaram Kalyanakrishnan and Peter Stone.
    Machine Learning (MLJ), 84(1--2):205-47, July 2011.
  • An Introduction to Inter-task Transfer for Reinforcement Learning.
    Matthew E. Taylor and Peter Stone.
    AI Magazine, 32(1):15-34, 2011.

  • Week 13 (4/17): Health and Sustainability

    Jump to the resources page.

  • Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning
    Arthur Guez, Robert D. Vincent, Massimo Avoli, Joelle Pineau.
    IAAI 2008
  • PAC Optimal Planning for Invasive Species Management: Improved Exploration for Reinforcement Learning from Simulator-Defined MDPs.
    Thomas G. Dietterich, Majid Taleghan, and Mark Crowley
    AAAI 2013.
  • Design, Analysis, and Learning Control of a Fully Actuated Micro Wind Turbine.
    J. Zico Kolter, Zachary Jackowski, Russ Tedrake
    American Controls Conference 2012.

  • Week 14 (4/24): Least squares methods

    Jump to the resources page.

  • Technical update: Least-squares temporal difference learning Justin A. Boyan
  • Model-Free Least-Squares Policy Iteration Michail G. Lagoudakis and Ronald Parr Proceedings of NIPS*2001: Neural Information Processing Systems: Natural and Synthetic Vancouver, BC, December 2001, pp. 1547-1554.

  • Week 15 (5/1): Multiagent RL

    Jump to the resources page.

  • Michael Littman, Markov Games as a Framework for Multi-Agent Reinforcement Learning, ICML, 1994.
  • Michael Bowling and Manuela Veloso
    Rational and Convergent Learning in Stochastic Games
    IJCAI 2001.
  • Rob Powers and Yoav Shoham
    New Criteria and a New Algorithm for Learning in Multi-Agent Systems
    NIPS 2004.
    journal version

  • Final Project: due at 9:00am on Wednesday, 5/8

    [Back to Department Homepage]

    Page maintained by Peter Stone
    Questions? Send me mail