| Function Approximation |   |   | Partial Observability |   |   | Learning Methods |   |   | Ensembles |   |   | 
| Stochastic Optimisation |   |   | General RL |   |   | General ML |   |   | Multiagent Learning |   |   | 
| Comparison/Integration |   |   | Bandits |   |   | Applications |   |   | Robot Soccer |   |   | 
| Humanoids |   |   | Parameter |   |   | MDP |   |   | Empirical |   |   | 
| Failure Warning |   |   | Representation |   |   | General AI |   |   | Neural Networks |   |   | 
| All |   |   | 
 Exploiting Best-Match Equations for Efficient Reinforcement Learning
 Harm van Seijen,  Shimon Whiteson,  Hado van Hasselt, and  Marco Wiering, 2011
    Details   
 Insights in Reinforcement Learning: formal analysis and empirical evaluation of temporal-difference learning algorithms
 Hado Philip van Hasselt, 2011
    Details   
 Relative Entropy Policy Search
 Jan Peters,  Katharina Mülling, and  Yasemin Altün, 2010
    Details   
 Model-based reinforcement learning with nearly tight exploration complexity bounds
 István Szita and  Csaba Szepesvári, 2010
    Details   
 Reinforcement learning of motor skills in high dimensions: A path integral approach
 Evangelos Theodorou,  Jonas Buchli, and  Stefan Schaal, 2010
    Details   
 The CMA Evolution Strategy: A Tutorial
 Nikolaus Hansen, 2009
    Details   
 Learning motor primitives for robotics
 Jens Kober and  Jan Peters, 2009
    Details   
 Efficient covariance matrix update for variable metric evolution strategies
 Thorsten Suttorp,  Nikolaus Hansen, and  Christian Igel, 2009
    Details   
 A Theoretical and Empirical Analysis of Expected Sarsa
 Harm van Seijen,  Hado van Hasselt,  Shimon Whiteson, and  Marco Wiering, 2009
    Details   
 Incremental Natural Actor-Critic Algorithms
 Shalabh Bhatnagar,  Richard S. Sutton,  Mohammad Ghavamzadeh, and  Mark Lee, 2008
    Details   
 Accelerated Neural Evolution through Cooperatively Coevolved Synapses
 Faustino Gomez,  Jürgen Schmidhuber, and  Risto Miikkulainen, 2008
    Details   
 Similarities and differences between policy gradient methods and evolution strategies
 Verena Heidrich-Meisner and  Christian Igel, 2008
    Details   
 Evolution Strategies for Direct Policy Search
 Verena Heidrich-Meisner and  Christian Igel, 2008
    Details   
 Genetic Programming: An Introduction and Tutorial, with a Survey of Techniques and Applications
 William B. Langdon,  Riccardo Poli,  Nicholas Freitag McPhee, and  John R. Koza, 2008
    Details   
 Analysis of an Evolutionary Reinforcement Learning Method in a Multiagent Domain
 Jan Hendrik Metzen,  Mark Edgington,  Yohannes Kassahun, and  Frank Kirchner, 2008
    Details   
 Reinforcement learning of motor skills with policy gradients
 Jan Peters and  Stefan Schaal, 2008
    Details   
 Natural Actor-Critic
 Jan Peters and  Stefan Schaal, 2008
    Details   
 Sample-based Learning and Search with Permanent and Transient Memories
 David Silver,  Richard S. Sutton, and  Martin Müller, 2008
    Details   
 Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping
 Richard S. Sutton,  Csaba Szepesvári,  Alborz Geramifard, and  Michael Bowling, 2008
    Details   
 Sample Complexity of Policy Search with Known Dynamics
 Peter L. Bartlett and  Ambuj Tewari, 2007
    Details   
 Bayesian actor-critic algorithms
 Mohammad Ghavamzadeh and  Yaakov Engel, 2007
    Details   
 Bayesian Policy Gradient Algorithms
 Mohammad Ghavamzadeh and  Yaakov Engel, 2007
    Details   
 Batch Reinforcement Learning in a Complex Domain
 Shivaram Kalyanakrishnan and  Peter Stone, 2007
    Details   
 Large Scale Reinforcement Learning using Q-Sarsa($łambda$) and Cascading Neural Networks
 Steffen Nissen, 2007
    Details   
 Representation Transfer for Reinforcement Learning
 Matthew E. Taylor and  Peter Stone, 2007
    Details   
 Adaptive Representations for Reinforcement Learning
 Shimon Azariah Whiteson, 2007
    Details   
 Evolutionary Function Approximation for Reinforcement Learning
 Shimon Whiteson and  Peter Stone, 2006
    Details   
 On-line evolutionary computation for reinforcement learning in stochastic domains
 Shimon Whiteson and  Peter Stone, 2006
    Details   
 Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method
 Martin Riedmiller, 2005
    Details   
 A Tutorial on the Cross-Entropy Method
 Pieter-Tjerk de Boer,  Dirk P. Kroese,  Shie Mannor, and  Reuven Y. Rubinstein, 2005
    Details   
 Machine Learning for Fast Quadrupedal Locomotion
 Nate Kohl and  Peter Stone, 2004
    Details   
 Efficient Evolution of Neural Networks Through Complexification
 Kenneth Owen Stanley, 2004
    Details   
 On Actor-Critic Algorithms
 Vijay R. Konda and  John N. Tsitsiklis, 2003
    Details   
 Reinforcement Learning as Classification: Leveraging Modern Classifiers
 Michail G. Lagoudakis and  Ronald Parr, 2003
    Details   
 Scaling Internal-State Policy-Gradient Methods for POMDPs
 Douglas Aberdeen and  Jonathan Baxter, 2002
    Details   
 Approximately Optimal Approximate Reinforcement Learning
 Sham Kakade and  John Langford, 2002
    Details   
 Learning from Scarce Experience
 Leonid Peshkin and  Christian R. Shelton, 2002
    Details   
 Infinite-Horizon Policy-Gradient Estimation
 Jonathan Baxter and  Peter L. Bartlett, 2001
    Details   
 A Natural Policy Gradient
 Sham Kakade, 2001
    Details   
 Reinforcement Learning in POMDP's via Direct Gradient Ascent
 Jonathan Baxter and  Peter L. Bartlett, 2000
    Details   
 Policy Search via Density Estimation
 Andrew Y. Ng,  Ronald Parr, and  Daphne Koller, 2000
    Details   
 PEGASUS: A policy search method for large MDPs and POMDPs
 Andrew Y. Ng and  Michael Jordan, 2000
    Details   
 Policy Gradient Methods for Reinforcement Learning with Function Approximation
 Richard S. Sutton,  David A. McAllester,  Satinder P. Singh, and  Yishay Mansour, 2000
    Details   
 Gradient Descent for General Reinforcement Learning
 Leemon Baird and  Andrew Moore, 1999
    Details   
 Solving Non-Markovian Control Tasks with Neuro-Evolution
 Faustino J. Gomez and  Risto Miikkulainen, 1999
    Details   
 Evolutionary Algorithms for Reinforcement Learning
 David E. Moriarty,  Alan C. Schultz, and  John J. Grefenstette, 1999
    Details   
 Robot Shaping: An Experiment in Behavior Engineering
 Marco Dorigo and  Marco Colombetti, 1998
    Details   
 Reinforcement Learning: An Introduction
 Richard S. Sutton and  Andrew G. Barto, 1998
    Details   
 Neuro-Dynamic Programming
 Dimitri P. Bertsekas and  John N. Tsitsiklis, 1996
    Details   
 Reinforcement learning with replacing eligibility traces
 Satinder P. Singh and  Richard S. Sutton, 1996
    Details   
 On-line Q-learning using connectionist systems
 G. A. Rummery and  M. Niranjan, 1994
    Details   
 Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time
 Andrew W. Moore and  Christopher G. Atkeson, 1993
    Details   
 Efficient learning and planning within the Dyna framework
 Jing Peng and  Ronald J. Williams, 1993
    Details   
 Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching
 Long-Ji Lin, 1992
    Details   
 Q-Learning
 Christopher J. C. H. Watkins and  Peter Dayan, 1992
    Details   
 Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
 Ronald J. Williams, 1992
    Details   
 Learning Sequential Decision Rules Using Simulation Models and Competition
 John J. Grefenstette,  Connie Loggia Ramsey, and  Alan C. Schultz, 1990
    Details   
 Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
 Richard S. Sutton, 1990
    Details   
 Neuronlike adaptive elements that can solve difficult learning control problems
 Andrew G. Barto,  Richard S. Sutton, and  Charles W. Anderson, 1983
    Details