CS394R: Reinforcement Learning: Theory and Practice -- Spring 2011: Resources Page

Resources for Reinforcement Learning: Theory and Practice


Week 1: Class Overview, Introduction


Week 2: Evaluative Feedback

  • Vermorel and Mohri: Multi-Armed Bandit Algorithms and Empirical Evaluation.
  • Rich Sutton's slides for Chapter 2: html.
  • Matt Taylor has done a lot of research on transfer learning for RL.

  • Week 3: The Reinforcement Learning Problem

  • Rich Sutton's slides for Chapter 3: pdf.

  • Week 4: Dynamic Programming

  • Email discussion on the Gambler's problem.
  • A paper on "On the Complexity of solving MDPs" (Littman, Dean, and Kaelbling, 1995).
  • Pashenkova, Rish, and Dechter: Value Iteration and Policy Iteration Algorithms for Markov Decision Problems.
  • Rich Sutton's slides for Chapter 4: html.

  • Week 5: Monte Carlo Methods

  • A paper that addresses relationship between first-visit and every-visit MC (Singh and Sutton, 1996). For some theoretical relationships see section starting at section 3.3 (and referenced appendices).
  • Rich Sutton's slides for Chapter 5: html.

  • Week 6: Temporal Difference Learning

  • A couple of articles on the details of actor-critic in practice by Tsitsklis and by Williams.
  • Sprague and Ballard: Multiple-Goal Reinforcement Learning with Modular Sarsa(0).
  • Rich Sutton's slides for Chapter 6: html.

  • Week 7: Eligibility Traces

  • The equivalence of MC and first visit TD(1) is proven in the same Singh and Sutton paper that's referenced above (Singh and Sutton, 1996). See starting at Section 2.4.
  • Dayan: The Convergence of TD(&lambda) for General &lambda.
  • Rich Sutton's slides for Chapter 7: html.

  • Week 8: Generalization and Function Approximation

  • Evolutionary Function Approximation by Shimon Whiteson.
  • Sridhar Mahadaven's proto-value functions
  • Dopamine: generalization and Bonuses (2002) Kakade and Dayan.
  • Andrew Smith's Applications of the Self-Organising Map to Reinforcement Learning
  • Bernd Fritzke's very clear Some Competitive Learning Methods
  • DemoGNG - a nice visual demo of competitive learning
  • Residual Algorithms: Reinforcement Learning with Function Approximation (1995) Leemon Baird. More on the Baird counterexample as well as an alternative to doing gradient descent on the MSE.
  • Boyan, J. A., and A. W. Moore, Generalization in Reinforcement Learning: Safely Approximating the Value Function. In Tesauro, G., D. S. Touretzky, and T. K. Leen (eds.), Advances in Neural Information Processing Systems 7 (NIPS). MIT Press, 1995. Another example of function approximation divergence and a proposed solution.
  • Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces (1998) Juan Carlos Santamaria, Richard S. Sutton, Ashwin Ram. Comparisons of several types of function approximators (including instance-based like Kanerva).
  • Binary action search for learning continuous-action control policies (2009). Pazis and Lagoudakis.
  • Least-Squares Temporal Difference Learning Justin Boyan.
  • A Convergent Form of Approximate Policy Iteration (2002) T. J. Perkins and D. Precup. A new convergence guarantee with function approximation.
  • Moore and Atkeson: The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State Spaces.
  • Sherstov and Stone: Function Approximation via Tile Coding: Automating Parameter Choice.
  • Chapman and Kaelbling: Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons.
  • Rich Sutton's slides for Chapter 8: html.

  • Week 9: Planning and Learning

  • Slides from 3/22: pdf. The planning ones.
  • Rich Sutton's slides for Chapter 9: html.
  • ICML 2004 workshop on relational RL
  • Sašo Džeroski, Luc De Raedt and Kurt Driessens: Relational Reinforcement Learning.

  • Week 10: Game Playing

  • Slides from 3/29: pdf.
  • The ones on minimax: ppt.
  • Slides from Gelly's thesis (I showed the UCT part in class): ppt.
  • Neural network slides (from Tom Mitchell's book)
  • Motif backgammon (online player)
  • GNU backgammon
  • Practical Issues in Temporal Difference Learning: an earlier paper by Tesauro (with a few more details)
  • A more complete overview of UCT as applied to Go (to appear): Monte-Carlo Tree Search and Rapid Action Value Estimation in Computer Go". Gelly and Silver. To appear in AIJ.
  • Some papers from Simon Lucas' group on comparing TD learning and co-evolution in various games: Othello; Go; Simple grid-world Treasure hunt; Ms. Pac-Man.

  • Week 11: Efficient model-based Learning

  • Slides from 4/7: pdf.
  • The ones on DBNs: ppt.
  • Slides from Gelly's thesis (I showed the UCT part in class): ppt.
  • Near-Optimal Reinforcement Learning in Polynomial Time
    Satinder Singh and Michael Kearns
  • Strehl et al.: PAC Model-Free Reinforcement Learning.

  • Week 12: Abstraction: Options and Hierarchy

  • Slides from 4/12: pdf. The ones from Matthew.
  • Slides from 4/14: pdf.
  • A page devoted to option discovery
  • Improved Automatic Discovery of Subgoals for Options in Hierarchical Reinforcement Learning by Kretchmar et al.
  • Nick Jong and Todd Hester's paper on the utility of temporal abstraction. The slides.
  • The Journal version of the MaxQ paper
  • A follow-up paper on eliminating irrelevant variables within a subtask: State Abstraction in MAXQ Hierarchical Reinforcement Learning
  • Automatic Discovery and Transfer of MAXQ Hierarchies (from Dietterich's group - 2008)
  • Lihong Li and Thomas J. Walsh and Michael L. Littman, Towards a Unified Theory of State Abstraction for MDPs , Ninth International Symposium on Artificial Intelligence and Mathematics , 2006.
  • Tom Dietterich's tutorial on abstraction.
  • Nick Jong's paper on state abstraction discovery. The slides.

  • Week 13: Robotics Applications

  • Slides from 4/19: pdf. The ones on walking Aibos. The ones on biped walking.
  • Slides from 4/21: pdf. Dan Lessin's
  • Adaptive Choice of Grid and Time in Reinforcement Learning. Stephan Pareigis, NIPS 1997.
  • This paper compares the policy gradient RL method with other algorithms on the walk learning: Machine Learning for Fast Quadrupedal Locomotion. Kohl and Stone. AAAI 2004.
  • from Jan Peters' group: Learning Tetris Using the Noisy Cross-Entropy Method.
  • The original PEGASUS paper.
  • Some of the helicopter videos
  • Some other papers on helicopter control and soocer:
  • Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods.
    J. Bagnell and J. Schneider
    Proceedings of the International Conference on Robotics and Automation 2001, IEEE, May, 2001.
  • Scaling Reinforcement Learning toward RoboCup Soccer.
    Peter Stone and Richard S. Sutton.
    Proceedings of the Eighteenth International Conference on Machine Learning, pp. 537-544, Morgan Kaufmann, San Francisco, CA, 2001.
  • The UT Austin Villa RoboCup team home page.
  • Greg Kuhlmann's follow-up on progress in 3v2 keepaway
  • Kalyanakishnan et al.: Model-based Reinforcement Learning in a Complex Domain.
  • Reinforcement Learning for Sensing Strategies.
    C. Kwok and D. Fox.
    Proceedings of IROS, 2004.
  • Learning from Observation and Practice Using Primitives.
    Darrin Bentivegna, Christopher Atkeson, and Gordon Cheng.
    AAAI Fall Symposium on Real Life Reinforcement Learning, 2004.

  • Week 14: Least Squares Methods

  • Slides from 4/26: pdf. The ones on LSPI from Alan Fern (based on Ron Parr's).
  • Yaroslav's slides and some notes he wrote up on representing/solving MDPs in Matrix notation.
  • Policy Iteration for Factored MDPs
    by Daphne Koller and Ronald Parr: UAI 2000
    Some related slides
  • Online Exploration in Least-Squares Policy Iteration
    by Lihong Li, Michael L. Littman, and Christopher R. Mansley: AAMAS 2009
    The slides from AAMAS 2009.

  • Week 15: Multiagent RL

  • Multi-Agent Reinforcement Learning: Independent vs. Coopeative Agents
    by Ming Tang

  • Final Project

    RL-GLue : http://glue.rl-community.org/wiki/Main_Page
    The following paper gives you an idea of what RL-GLue offers: pdf
    It is language independent. You can run your own agent program in any language of your choice. Its main purpose is to provide a standard platform where everybody can test their RL algorithms, and report results.
    More information can be found here : link

    PyBrain : link
    PyBrain, as its written-out name already suggests, contains algorithms for neural networks, for reinforcement learning (and the combination of the two), for unsupervised learning, and evolution. Since most of the current problems deal with continuous state and action spaces, function approximators (like neural networks) must be used to cope with the large dimensionality. The library is built around neural networks in the kernel and all of the training methods accept a neural network as the to-be-trained instance.
    I believe this can be a good resource for those who are planning to work on continuous domains.

    [Back to Department Homepage]

    Page maintained by Peter Stone
    Questions? Send me mail