Shivaram's Reading List

Function Approximation	Partial Observability	Learning Methods	Ensembles
Stochastic Optimisation	General RL	General ML	Multiagent Learning
Comparison/Integration	Bandits	Applications	Robot Soccer
Humanoids	Parameter	MDP	Empirical
Failure Warning	Representation	General AI	Neural Networks
All

Learning Methods

Exploiting Best-Match Equations for Efficient Reinforcement Learning
Harm van Seijen, Shimon Whiteson, Hado van Hasselt, and Marco Wiering, 2011
Details

Insights in Reinforcement Learning: formal analysis and empirical evaluation of temporal-difference learning algorithms
Hado Philip van Hasselt, 2011
Details

Relative Entropy Policy Search
Jan Peters, Katharina Mülling, and Yasemin Altün, 2010
Details

Model-based reinforcement learning with nearly tight exploration complexity bounds
István Szita and Csaba Szepesvári, 2010
Details

Reinforcement learning of motor skills in high dimensions: A path integral approach
Evangelos Theodorou, Jonas Buchli, and Stefan Schaal, 2010
Details

The CMA Evolution Strategy: A Tutorial
Nikolaus Hansen, 2009
Details

Learning motor primitives for robotics
Jens Kober and Jan Peters, 2009
Details

Efficient covariance matrix update for variable metric evolution strategies
Thorsten Suttorp, Nikolaus Hansen, and Christian Igel, 2009
Details

A Theoretical and Empirical Analysis of Expected Sarsa
Harm van Seijen, Hado van Hasselt, Shimon Whiteson, and Marco Wiering, 2009
Details

Incremental Natural Actor-Critic Algorithms
Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, and Mark Lee, 2008
Details

Accelerated Neural Evolution through Cooperatively Coevolved Synapses
Faustino Gomez, Jürgen Schmidhuber, and Risto Miikkulainen, 2008
Details

Similarities and differences between policy gradient methods and evolution strategies
Verena Heidrich-Meisner and Christian Igel, 2008
Details

Evolution Strategies for Direct Policy Search
Verena Heidrich-Meisner and Christian Igel, 2008
Details

Genetic Programming: An Introduction and Tutorial, with a Survey of Techniques and Applications
William B. Langdon, Riccardo Poli, Nicholas Freitag McPhee, and John R. Koza, 2008
Details

Analysis of an Evolutionary Reinforcement Learning Method in a Multiagent Domain
Jan Hendrik Metzen, Mark Edgington, Yohannes Kassahun, and Frank Kirchner, 2008
Details

Reinforcement learning of motor skills with policy gradients
Jan Peters and Stefan Schaal, 2008
Details

Natural Actor-Critic
Jan Peters and Stefan Schaal, 2008
Details

Sample-based Learning and Search with Permanent and Transient Memories
David Silver, Richard S. Sutton, and Martin Müller, 2008
Details

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, and Michael Bowling, 2008
Details

Sample Complexity of Policy Search with Known Dynamics
Peter L. Bartlett and Ambuj Tewari, 2007
Details

Bayesian actor-critic algorithms
Mohammad Ghavamzadeh and Yaakov Engel, 2007
Details

Bayesian Policy Gradient Algorithms
Mohammad Ghavamzadeh and Yaakov Engel, 2007
Details

Batch Reinforcement Learning in a Complex Domain
Shivaram Kalyanakrishnan and Peter Stone, 2007
Details

Large Scale Reinforcement Learning using Q-Sarsa($łambda$) and Cascading Neural Networks
Steffen Nissen, 2007
Details

Representation Transfer for Reinforcement Learning
Matthew E. Taylor and Peter Stone, 2007
Details

Adaptive Representations for Reinforcement Learning
Shimon Azariah Whiteson, 2007
Details

Evolutionary Function Approximation for Reinforcement Learning
Shimon Whiteson and Peter Stone, 2006
Details

On-line evolutionary computation for reinforcement learning in stochastic domains
Shimon Whiteson and Peter Stone, 2006
Details

Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method
Martin Riedmiller, 2005
Details

A Tutorial on the Cross-Entropy Method
Pieter-Tjerk de Boer, Dirk P. Kroese, Shie Mannor, and Reuven Y. Rubinstein, 2005
Details

Machine Learning for Fast Quadrupedal Locomotion
Nate Kohl and Peter Stone, 2004
Details

Efficient Evolution of Neural Networks Through Complexification
Kenneth Owen Stanley, 2004
Details

On Actor-Critic Algorithms
Vijay R. Konda and John N. Tsitsiklis, 2003
Details

Reinforcement Learning as Classification: Leveraging Modern Classifiers
Michail G. Lagoudakis and Ronald Parr, 2003
Details

Scaling Internal-State Policy-Gradient Methods for POMDPs
Douglas Aberdeen and Jonathan Baxter, 2002
Details

Approximately Optimal Approximate Reinforcement Learning
Sham Kakade and John Langford, 2002
Details

Learning from Scarce Experience
Leonid Peshkin and Christian R. Shelton, 2002
Details

Infinite-Horizon Policy-Gradient Estimation
Jonathan Baxter and Peter L. Bartlett, 2001
Details

A Natural Policy Gradient
Sham Kakade, 2001
Details

Reinforcement Learning in POMDP's via Direct Gradient Ascent
Jonathan Baxter and Peter L. Bartlett, 2000
Details

Policy Search via Density Estimation
Andrew Y. Ng, Ronald Parr, and Daphne Koller, 2000
Details

PEGASUS: A policy search method for large MDPs and POMDPs
Andrew Y. Ng and Michael Jordan, 2000
Details

Policy Gradient Methods for Reinforcement Learning with Function Approximation
Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour, 2000
Details

Gradient Descent for General Reinforcement Learning
Leemon Baird and Andrew Moore, 1999
Details

Solving Non-Markovian Control Tasks with Neuro-Evolution
Faustino J. Gomez and Risto Miikkulainen, 1999
Details

Evolutionary Algorithms for Reinforcement Learning
David E. Moriarty, Alan C. Schultz, and John J. Grefenstette, 1999
Details

Robot Shaping: An Experiment in Behavior Engineering
Marco Dorigo and Marco Colombetti, 1998
Details

Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto, 1998
Details

Neuro-Dynamic Programming
Dimitri P. Bertsekas and John N. Tsitsiklis, 1996
Details

Reinforcement learning with replacing eligibility traces
Satinder P. Singh and Richard S. Sutton, 1996
Details

On-line Q-learning using connectionist systems
G. A. Rummery and M. Niranjan, 1994
Details

Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time
Andrew W. Moore and Christopher G. Atkeson, 1993
Details

Efficient learning and planning within the Dyna framework
Jing Peng and Ronald J. Williams, 1993
Details

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching
Long-Ji Lin, 1992
Details

Q-Learning
Christopher J. C. H. Watkins and Peter Dayan, 1992
Details

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
Ronald J. Williams, 1992
Details

Learning Sequential Decision Rules Using Simulation Models and Competition
John J. Grefenstette, Connie Loggia Ramsey, and Alan C. Schultz, 1990
Details

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
Richard S. Sutton, 1990
Details

Neuronlike adaptive elements that can solve difficult learning control problems
Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson, 1983
Details