CS395T: Reinforcement Learning: Theory and Practice -- Fall 2004: Resources Page

Resources for Reinforcement Learning: Theory and Practice


Week 0 (8/26): Class Overview

  • Slides from 8/26: pdf.

  • Week 1 (8/31,9/2): Introduction

  • Slides from 8/31: pdf.

  • Week 2 (9/7,9/9): Evaluative Feedback

  • Slides from 9/7: pdf.

  • Week 3 (9/14,16): The Reinforcement Learning Problem

  • Slides from 9/14: pdf.

  • Week 4 (9/21,23): Dynamic Programming

  • Slides from 9/21: pdf.
  • Slides from 9/23: pdf.
  • Email discussion on the Gambler's problem.
  • A paper on "The Complexity of solving MDPs" (Littman, Dean, and Kaelbling, 1995).

  • Week 5 (9/28,9/30): Monte Carlo Methods

  • Slides from 9/28: pdf.
  • Slides from 9/30: pdf.
  • A paper that addresses relationship between first-visit and every-visit MC (Singh and Sutton, 1996). For some theoretical relationships see section starting at section 3.3 (and referenced appendices).

  • Week 6 (10/5,7): Temporal Difference Learning

  • Slides from 10/5: pdf.
  • Slides from 10/7: pdf.
  • A couple of articles on the details of actor-critic in practice by Tsitsklis and by Williams.

  • Week 7 (10/12,14): Eligibility Traces

  • Slides from 10/12: pdf.
  • Slides from 10/14: pdf.
  • The equivalence of MC and first visit TD(1) is proven in the same Singh and Sutton paper that's referenced above (Singh and Sutton, 1996). See starting at Section 2.4.

  • Week 8 (10/19,21): Generalization and Function Approximation

  • Slides from 10/19: pdf.
  • Slides from 10/21: pdf.
  • The paper Igor presented in class: Dopamine: generalization and Bonuses (2002) Kakade and Dayan.
  • Andrew Smith's Applications of the Self-Organising Map to Reinforcement Learning
  • Bernd Fritzke's very clear Some Competative Learning Methods
  • DemoGNG - a nice visual demo of competative learning
  • Residual Algorithms: Reinforcement Learning with Function Approximation (1995) Leemon Baird. More on the Baird counterexample as well as an alternative to doing gradient descent on the MSE.
  • Boyan, J. A., and A. W. Moore, Generalization in Reinforcement Learning: Safely Approximating the Value Function. In Tesauro, G., D. S. Touretzky, and T. K. Leen (eds.), Advances in Neural Information Processing Systems 7 (NIPS). MIT Press, 1995. Another example of function approximation divergence and a proposed solution.
  • Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces (1998) Juan Carlos Santamaria, Richard S. Sutton, Ashwin Ram. Comparisons of several types of function approximators (including instance-based like Kanerva).
  • Least-Squares Temporal Difference Learning Justin Boyan.
  • A Convergent Form of Approximate Policy Iteration (2002) T. J. Perkins and D. Precup. A new convergence guarantee with function approximation.

  • Week 9 (10/26,28): Planning and Learning

  • Slides from 10/26: pdf. The planning ones.
  • Slides from 10/28: pdf.

  • Week 10 (11/2,4): Case Studies

  • Slides from 11/2: pdf.
  • Slides from 11/4: pdf.
  • ICML 2004 workshop on relational RL
  • Tony Cassandra's POMDP for Dummies
  • Michael Littman's POMDP information page

  • Week 11 (11/9,11): Abstraction: Options and Hierarchy

  • Slides from 11/9: pdf.
  • Slides from 11/11: pdf.
  • Alex's discussion slides.
  • Jon's discussion slides.
  • Automatic Discovery of Subgoals in RL using Diverse Density by McGovern and Barto.
  • Improved Automatic Discovery of Subgoals for Options in Hierarchical Reinforcement Learning by Kretchmar et al.
  • The Journal version of the MaxQ paper
  • A follow-up paper on liminating irrelevant variables within a subtask: State Abstraction in MAXQ Hierarchical Reinforcement Learning
  • Tom Dietterich's tutorial on abstraction.

  • Week 12 (11/16,18): Helicopter and Robot Control

  • Slides from 11/16: pdf.
  • Slides from 11/18: pdf.
  • Andrew Moore's tutorial on VC dimension
  • And a paper by him on A Nonparametric Approach to Noisy and Costly Optimization
  • PEGASUS: A policy search method for large MDPs and POMDPs, Andrew Y. Ng and Michael Jordan. In Uncertainty in Artificial Intelligence, Proceedings of the Sixteenth Conference, 2000.
  • A section from A David Cohn paper on locally weighted regression.
  • A page on localy weighted polynomial regression.
  • A good tutorial on memory-based learning (including material on kernels and LWPR) by Andrew Moore.

  • Week 13 (11/23): Robot Soccer

  • Slides from 11/23: pdf; the keepaway slides; and a few more.
  • Dieter Fox's mobile robotics page: project animations; landmark-based localization.
  • Michail G. Lagoudakis' page has a paper on LSPI as well as slides from his thesis defense about it.
  • The UT Austin Villa RoboCup team home page.
  • Greg Kuhlmann's follow-up on progress in 3v2 keepaway
  • Matt Taylor's recent paper on behavior transfer in keepaway.

  • Week 14 (11/30,12/2): Incorporating Advice

  • Slides from 11/30: pdf.
  • Slides from 12/2: pdf.
  • PILLAR
  • Pengo

  • [Back to Department Homepage]

    Page maintained by Peter Stone
    Questions? Send me mail