CS394R: Reinforcement Learning: Theory and Practice -- Fall 2007: Resources Page

Resources for Reinforcement Learning: Theory and Practice


Week 0: Class Overview

  • Slides from 8/30: pdf.

  • Week 1: Introduction

  • Slides from 9/4, 9/6: pdf.

  • Week 2: Evaluative Feedback

  • Slides from 9/11: pdf.
  • Vermorel and Mohri: Multi-Armed Bandit Algorithms and Empirical Evaluation.
  • Rich Sutton's slides for Chapter 2: html.

  • Week 3: The Reinforcement Learning Problem

  • Slides from 9/18: pdf.
  • Dietterich: The MAXQ Method for Hierarchical Reinforcement Learning.
  • Jong and Stone: State Abstraction Discovery from Irrelevant State Variables.
  • Rich Sutton's slides for Chapter 3: pdf.

  • Week 4: Dynamic Programming

  • Slides from 9/25: pdf.
  • Email discussion on the Gambler's problem.
  • A paper on "The Complexity of solving MDPs" (Littman, Dean, and Kaelbling, 1995).
  • Tumer and Agogino: Distributed Agent-Based Air Traffic Flow Management.
  • Pashenkova, Rish, and Dechter: Value Iteration and Policy Iteration Algorithms for Markov Decision Problems.
  • Rich Sutton's slides for Chapter 4: html.

  • Week 5: Monte Carlo Methods

  • Slides from 10/2: pdf.
  • A paper that addresses relationship between first-visit and every-visit MC (Singh and Sutton, 1996). For some theoretical relationships see section starting at section 3.3 (and referenced appendices).
  • Rich Sutton's slides for Chapter 5: html.

  • Week 6: Temporal Difference Learning

  • Slides from 10/9: pdf.
  • A couple of articles on the details of actor-critic in practice by Tsitsklis and by Williams.
  • Sprague and Ballard: Multiple-Goal Reinforcement Learning with Modular Sarsa(0).
  • Rich Sutton's slides for Chapter 6: html.

  • Week 7: Eligibility Traces

  • Slides from 10/16: pdf.
  • The equivalence of MC and first visit TD(1) is proven in the same Singh and Sutton paper that's referenced above (Singh and Sutton, 1996). See starting at Section 2.4.
  • Dayan: The Convergence of TD(&lambda) for General &lambda.
  • Rich Sutton's slides for Chapter 7: html.

  • Week 8: Generalization and Function Approximation

  • Slides from 10/23: pdf.
  • Dopamine: generalization and Bonuses (2002) Kakade and Dayan.
  • Andrew Smith's Applications of the Self-Organising Map to Reinforcement Learning
  • Bernd Fritzke's very clear Some Competitive Learning Methods
  • DemoGNG - a nice visual demo of competitive learning
  • Residual Algorithms: Reinforcement Learning with Function Approximation (1995) Leemon Baird. More on the Baird counterexample as well as an alternative to doing gradient descent on the MSE.
  • Boyan, J. A., and A. W. Moore, Generalization in Reinforcement Learning: Safely Approximating the Value Function. In Tesauro, G., D. S. Touretzky, and T. K. Leen (eds.), Advances in Neural Information Processing Systems 7 (NIPS). MIT Press, 1995. Another example of function approximation divergence and a proposed solution.
  • Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces (1998) Juan Carlos Santamaria, Richard S. Sutton, Ashwin Ram. Comparisons of several types of function approximators (including instance-based like Kanerva).
  • Least-Squares Temporal Difference Learning Justin Boyan.
  • A Convergent Form of Approximate Policy Iteration (2002) T. J. Perkins and D. Precup. A new convergence guarantee with function approximation.
  • On-line calculators of t-tests
  • Slides on Decision Trees from Tom Mitchell's book Machine Learning
  • Moore and Atkeson: The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State Spaces.
  • Sherstov and Stone: Function Approximation via Tile Coding: Automating Parameter Choice.
  • Chapman and Kaelbling: Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons.
  • Rich Sutton's slides for Chapter 8: html.

  • Week 9: Planning and Learning

  • Ng et al.: Autonomous helicopter flight via reinforement learning.
  • Szita and Lörincz: Learning Tetris Using the Noisy Cross-Entropy Method.
  • Kalyanakishnan et al.: Model-based Reinforcement Learning in a Complex Domain.
  • Strehl et al.: PAC Model-Free Reinforcement Learning.
  • Kearns and Singh: Near-Optimal Reinforcement Learning in Polynomial Time.
  • Rich Sutton's slides for Chapter 9: html.

  • Week 10: Case Studies

  • Slides from 11/6: pdf.
  • Doran's discussion slides and a related source: Leonid Kuvayev's Masters Thesis.
  • Zhang and Dietterich's job-shop scheduling paper.
  • University of Michigan's successes of RL page
  • Tony Cassandra's POMDP for Dummies
  • Michael Littman's POMDP information page
  • ICML 2004 workshop on relational RL
  • Sašo Džeroski, Luc De Raedt and Kurt Driessens: Relational Reinforcement Learning.

  • Week 11: Abstraction: Options and Hierarchy

  • Slides from 11/13: pdf.
  • Slides from 11/15: pdf.
  • Sasha Sherstov's 2004 slides on option discovery.
  • A page devoted to option discovery
  • Improved Automatic Discovery of Subgoals for Options in Hierarchical Reinforcement Learning by Kretchmar et al.
  • The Journal version of the MaxQ paper
  • A follow-up paper on liminating irrelevant variables within a subtask: State Abstraction in MAXQ Hierarchical Reinforcement Learning
  • Tom Dietterich's tutorial on abstraction.
  • Nick Jong's paper on state abstraction discovery. The slides.

  • Week 12: Helicopter Control and Robot Soccer

  • Slides from 11/20: pdf.
  • The original PEGASUS paper.
  • Some other papers on helicopter control and soocer:
  • Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods.
    J. Bagnell and J. Schneider
    Proceedings of the International Conference on Robotics and Automation 2001, IEEE, May, 2001.
  • Scaling Reinforcement Learning toward RoboCup Soccer.
    Peter Stone and Richard S. Sutton.
    Proceedings of the Eighteenth International Conference on Machine Learning, pp. 537-544, Morgan Kaufmann, San Francisco, CA, 2001.
  • The UT Austin Villa RoboCup team home page.
  • Greg Kuhlmann's follow-up on progress in 3v2 keepaway
  • Reinforcement Learning for Sensing Strategies.
    C. Kwok and D. Fox.
    Proceedings of IROS, 2004.
  • Learning from Observation and Practice Using Primitives.
    Darrin Bentivegna, Christopher Atkeson, and Gordon Cheng.
    AAAI Fall Symposium on Real Life Reinforcement Learning, 2004.

  • Week 13: Adaptive Representations and Transfer Learning

  • Kenneth Stanley and Risto Miikkulainen: Efficient Evolution of Neural Network Topologies.

  • Week 14: Advice and Multiagent Reinforcement Learning

  • Slides from 12/4: pdf. The keepaway ones.
  • Slides from 12/6: pdf. The pursuit domain ones.
  • PILLAR
  • Pengo
  • A nice reading list on more advanced multiagent RL.
  • Gregory Kuhlmann, Peter Stone, Raymond Mooney, and Jude Shavlik: Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer.
  • Sonia Chernova and Manuela Veloso: Confidence-Based Policy Learning from Demonstration Using Gaussian Mixture Models.

  • [Back to Department Homepage]

    Page maintained by Peter Stone
    Questions? Send me mail