CS394R: Reinforcement Learning: Theory and Practice -- Spring 2013: Resources Page

Resources for Reinforcement Learning: Theory and Practice


Week 1: Class Overview, Introduction

  • Slides from week 1: pdf.

  • Week 2: Evaluative Feedback and the RL Problem

  • Slides from 1/23: pdf.
  • Elad's discussion slides
  • Vermorel and Mohri: Multi-Armed Bandit Algorithms and Empirical Evaluation.
  • Shivaram Kalyanakrishnan and Peter Stone: Efficient Selection of Multiple Bandit Arms: Theory and Practice. In ICML 2010. Here are some related slides.
  • An RL reading list from Shivaram Kalyanakrishnan.
  • Rich Sutton's slides for Chapter 2: html.
  • Rich Sutton's slides for Chapter 3: pdf.

  • Week 3: Dynamic Programming and Monte Carlo Methods

  • Slides from 1/30: pdf
  • Email discussion on the Gambler's problem.
  • A paper on "On the Complexity of solving MDPs" (Littman, Dean, and Kaelbling, 1995).
  • Pashenkova, Rish, and Dechter: Value Iteration and Policy Iteration Algorithms for Markov Decision Problems.
  • Rich Sutton's slides for Chapter 4: html.
  • A paper that addresses relationship between first-visit and every-visit MC (Singh and Sutton, 1996). For some theoretical relationships see section starting at section 3.3 (and referenced appendices).
  • Rich Sutton's slides for Chapter 5: html.

  • Week 4: Temporal Difference Learning and Eligibility Traces

  • Slides from 2/6: pdf.
  • A couple of articles on the details of actor-critic in practice by Tsitsklis and by Williams.
  • Sprague and Ballard: Multiple-Goal Reinforcement Learning with Modular Sarsa(0).
  • Harm van Seijen, Hado van Hasselt, Shimon Whiteson, and Marco Wiering, A Theoretical and Empirical Analysis of Expected Sarsa. In ADPRL 2009.
  • Rich Sutton's slides for Chapter 6: html.
  • The equivalence of MC and first visit TD(1) is proven in the same Singh and Sutton paper that's referenced above (Singh and Sutton, 1996). See starting at Section 2.4.
  • Dayan: The Convergence of TD(&lambda) for General &lambda.
  • Rich Sutton's slides for Chapter 7: html.
  • A Q-learning video

  • Week 5: Generalization and Planning

  • Slides from 2/13: pdf.
  • The ones on keepaway.
  • The ones on planning.
  • Evolutionary Function Approximation by Shimon Whiteson.
  • Sridhar Mahadaven's proto-value functions
  • Dopamine: generalization and Bonuses (2002) Kakade and Dayan.
  • Andrew Smith's Applications of the Self-Organising Map to Reinforcement Learning
  • Bernd Fritzke's very clear Some Competitive Learning Methods
  • DemoGNG - a nice visual demo of competitive learning
  • Residual Algorithms: Reinforcement Learning with Function Approximation (1995) Leemon Baird. More on the Baird counterexample as well as an alternative to doing gradient descent on the MSE.
  • Boyan, J. A., and A. W. Moore, Generalization in Reinforcement Learning: Safely Approximating the Value Function. In Tesauro, G., D. S. Touretzky, and T. K. Leen (eds.), Advances in Neural Information Processing Systems 7 (NIPS). MIT Press, 1995. Another example of function approximation divergence and a proposed solution.
  • Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces (1998) Juan Carlos Santamaria, Richard S. Sutton, Ashwin Ram. Comparisons of several types of function approximators (including instance-based like Kanerva).
  • Binary action search for learning continuous-action control policies (2009). Pazis and Lagoudakis.
  • Least-Squares Temporal Difference Learning Justin Boyan.
  • A Convergent Form of Approximate Policy Iteration (2002) T. J. Perkins and D. Precup. A new convergence guarantee with function approximation.
  • Moore and Atkeson: The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State Spaces.
  • Sherstov and Stone: Function Approximation via Tile Coding: Automating Parameter Choice.
  • Chapman and Kaelbling: Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons.
  • Rich Sutton's slides for Chapter 8: html.
  • Rich Sutton's slides for Chapter 9: html.
  • ICML 2004 workshop on relational RL
  • Sašo Džeroski, Luc De Raedt and Kurt Driessens: Relational Reinforcement Learning.

  • Week 6: Efficient model-based Learning

  • Slides from 4/7: pdf.
  • The ones on DBNs: ppt.
  • some Rmax slides
  • Slides and video for the k-meteorologists paper
  • Code for Fitted RMax.
  • Near-Optimal Reinforcement Learning in Polynomial Time
    Satinder Singh and Michael Kearns
  • Strehl et al.: PAC Model-Free Reinforcement Learning.
  • Efficient Structure Learning in Factored-state MDPs
    Alexander L. Strehl, Carlos Diuk, and Michael L. Littman
    AAAI'2007

  • Week 7: Bandits

  • An Empirical Evaluation of Thompson Sampling
    Olivier Chapelle and Lihong Li
    NIPS 2011
  • Slides by Sylvain Gelly on UCT
  • Slides by Alan Fern on Monte Carlo Tree Search and UCT

  • Week 8: Abstraction: Options and Hierarchy

  • Slides from 3/6: pdf
  • Sasha Sherstov's 2004 slides on option discovery.
  • A page devoted to option discovery
  • Improved Automatic Discovery of Subgoals for Options in Hierarchical Reinforcement Learning by Kretchmar et al.
  • Nick Jong and Todd Hester's paper on the utility of temporal abstraction. The slides.
  • The Journal version of the MaxQ paper
  • A follow-up paper on eliminating irrelevant variables within a subtask: State Abstraction in MAXQ Hierarchical Reinforcement Learning
  • Automatic Discovery and Transfer of MAXQ Hierarchies (from Dietterich's group - 2008)
  • Lihong Li and Thomas J. Walsh and Michael L. Littman, Towards a Unified Theory of State Abstraction for MDPs , Ninth International Symposium on Artificial Intelligence and Mathematics , 2006.
  • Tom Dietterich's tutorial on abstraction.
  • Nick Jong's paper on state abstraction discovery. The slides.
  • Nick Jong's Thesis code repository and annotated slides

  • Week 9: Game Playing

  • Slides from 3/20: pdf. The GGP ones.
  • Neural network slides (from Tom Mitchell's book)
  • Slides from Gelly's thesis (I showed the UCT part in class): ppt.
  • Motif backgammon (online player)
  • GNU backgammon
  • Practical Issues in Temporal Difference Learning: an earlier paper by Tesauro (with a few more details)
  • Modular Neural Networks for Learning Context-Dependent Game Strategies, Justin Boyan, 1992: a partial replication of TD-gammon.
  • A more complete overview of UCT as applied to Go: Monte-Carlo Tree Search and Rapid Action Value Estimation in Computer Go". Gelly and Silver. To appear in AIJ.
  • Some papers from Simon Lucas' group on comparing TD learning and co-evolution in various games: Othello; Go; Simple grid-world Treasure hunt; Ms. Pac-Man.
  • Some papers from the UT Learning Agents Research Group on General Game Playing

  • Week 10: Financial Applications

  • Slides from 3/27: pdf. The ones on RRL
  • A more recent RRL extension by Maringer and Ramtohul.
  • Censored Exploration and the Dark Pool Problem.
    Kuzman Ganchev, Michael Kearns, Yuriy Nevmyvaka, Jennifer Wortman Vaughan
    UAI 2009.
  • Kearns' 2012 STOC tutorial on computational finance.
  • A Microsoft talk by Kearns on the topic.
  • A page on advantage learning, with links to relevant papers.
  • Three Automated Stock-Trading Agents: A Comparative Study.
    Alexander Sherstov and Peter Stone.
    In AMEC 2004.
  • The slides

  • Week 11: Robotics Applications

  • Slides from 4/4: pdf. The ones on walking Aibos.
  • A video from Riedmiller's group
  • Adaptive Choice of Grid and Time in Reinforcement Learning. Stephan Pareigis, NIPS 1997.
  • This paper compares the policy gradient RL method with other algorithms on the walk learning: Machine Learning for Fast Quadrupedal Locomotion. Kohl and Stone. AAAI 2004.
  • Szita and Lörincz: Learning Tetris Using the Noisy Cross-Entropy Method.
  • The original PEGASUS paper.
  • Some of the helicopter videos
  • Some other papers on helicopter control and soocer:
  • Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods.
    J. Bagnell and J. Schneider
    Proceedings of the International Conference on Robotics and Automation 2001, IEEE, May, 2001.
  • Scaling Reinforcement Learning toward RoboCup Soccer.
    Peter Stone and Richard S. Sutton.
    Proceedings of the Eighteenth International Conference on Machine Learning, pp. 537-544, Morgan Kaufmann, San Francisco, CA, 2001.
  • The UT Austin Villa RoboCup team home page.
  • Greg Kuhlmann's follow-up on progress in 3v2 keepaway
  • Kalyanakishnan et al.: Model-based Reinforcement Learning in a Complex Domain.
  • Making a Robot Learn to Play Soccer Using Reward and Punishment.
    Heiko Müller, Martin Lauer, Roland Hafner, Sascha Lange, Artur Merke and Martin Riedmiller.
    30th Annual German Conference on AI, KI 2007.
  • Reinforcement Learning for Sensing Strategies.
    C. Kwok and D. Fox.
    Proceedings of IROS, 2004.
  • Learning from Observation and Practice Using Primitives.
    Darrin Bentivegna, Christopher Atkeson, and Gordon Cheng.
    AAAI Fall Symposium on Real Life Reinforcement Learning, 2004.
  • Natural Actor Critic.
    Jan Peters and Stefan Schaal
    Neurocomputing 2008. Earlier version in ECML 2005.
  • TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots.
    Todd Hester and Peter Stone
    Machine Learning 2012
  • PILCO: A Model-Based and Data-Efficient Approach to Policy Search.
    Marc Peter Deisenroth and Carl Edward Rasmussen
    ICML 2011

  • Week 12: Practical RL and Transfer Learning

  • Slides from 4/10: pdf. The ones on the MLJ article. The ones on CMA-ES. The ones on transfer learning.
  • Comparing Evolutionary and Temporal Difference Methods for Reinforcement Learning.
    Matthew Taylor, Shimon Whiteson, and Peter Stone.
    In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1321-28, July 2006.
  • Improving Action Selection in MDP's via Knowledge Transfer.
    Alexander A. Sherstov and Peter Stone.
    In Proceedings of the Twentieth National Conference on Artificial Intelligence, July 2005.
    Associated slides.
  • General Game Learning using Knowledge Transfer.
    Bikramjit Banerjee and Peter Stone.
    In The 20th International Joint Conference on Artificial Intelligence, 2007
    Associated slides.

  • Week 13: Health and Sustainability

  • Slides from 4/17: pdf.
  • An Intelligent Battery Controller Using Bias-Corrected Q-learning.
    Donghoon Lee and Warren Powell
    AAAI 2012.
  • Learning Policies For Battery Usage Optimization in Electric Vehicles.
    Stefano Ermon, Yexiang Xue, Carla Gomes, and Bart Selman
    ECML 2012.
  • A Learning Agent for Heat-Pump Thermostat Control.
    Daniel Urieli and Peter Stone.
    In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2013.
    Associated slides.

  • Week 14: Least Squares Methods

  • Slides from 4/24: pdf. The ones on LSPI from Alan Fern (based on Ron Parr's).
  • Yaroslav's slides, his derivation of the equation at the end of Section 3, and some notes he wrote up on representing/solving MDPs in Matrix notation.
  • Policy Iteration for Factored MDPs
    by Daphne Koller and Ronald Parr: UAI 2000
    Some related slides
  • Online Exploration in Least-Squares Policy Iteration
    by Lihong Li, Michael L. Littman, and Christopher R. Mansley: AAMAS 2009
    The slides from AAMAS 2009.

  • Week 15: Multiagent RL

  • The slides on threats(ps).
  • Busoniu, L. and Babuska, R. and De Schutter, B.
    A comprehensive survey of multiagent reinforcement learning
    IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applicati ons and Reviews, 28(2), 156-172, 2008.
  • Multi-Agent Reinforcement Learning: Independent vs. Coopeative Agents
    by Ming Tang
  • Michael Bowling
    Convergence and No-Regret in Multiagent Learning
    NIPS 2004
  • Kok, J.R. and Vlassis, N., Collaborative multiagent reinforcement learning by payoff propagation, The Journal of Machine Learning Research, 7, 1828, 2006.
  • A brief survey on Multiagent Learning. by Doran Chakraborty
  • gametheory.net
  • Some useful slides (part C) from Michael Bowling on game theory, stochastic games, correlated equilibria; and (Part D) from Michael Littman with more on stochastic games.
  • A suite of game generators called GAMUT from Stanford.
  • RoShamBo (rock-paper-scissors) contest
  • U. of Alberta page on automated poker.

  • Final Project

    RL-GLue : http://glue.rl-community.org/wiki/Main_Page
    The following paper gives you an idea of what RL-GLue offers: pdf
    It is language independent. You can run your own agent program in any language of your choice. Its main purpose is to provide a standard platform where everybody can test their RL algorithms, and report results.
    More information can be found here : link

    PyBrain : link
    PyBrain, as its written-out name already suggests, contains algorithms for neural networks, for reinforcement learning (and the combination of the two), for unsupervised learning, and evolution. Since most of the current problems deal with continuous state and action spaces, function approximators (like neural networks) must be used to cope with the large dimensionality. The library is built around neural networks in the kernel and all of the training methods accept a neural network as the to-be-trained instance.
    I believe this can be a good resource for those who are planning to work on continuous domains.

    [Back to Department Homepage]

    Page maintained by Peter Stone
    Questions? Send me mail