| Function Approximation |   |   | Partial Observability |   |   | Learning Methods |   |   | Ensembles |   |   | 
| Stochastic Optimisation |   |   | General RL |   |   | General ML |   |   | Multiagent Learning |   |   | 
| Comparison/Integration |   |   | Bandits |   |   | Applications |   |   | Robot Soccer |   |   | 
| Humanoids |   |   | Parameter |   |   | MDP |   |   | Empirical |   |   | 
| Failure Warning |   |   | Representation |   |   | General AI |   |   | Neural Networks |   |   | 
| All |   |   | 
 Learning Methods for Sequential Decision Making with Imperfect Representations
 Shivaram Kalyanakrishnan, 2011
    Details   
 Near-optimal Regret Bounds for Reinforcement Learning
 Thomas Jaksch,  Ronald Ortner, and  Peter Auer, 2010
    Details   
 Algorithms for Reinforcement Learning
 Csaba Szepesvári, 2010
    Details   
 Evolving Neural Networks for Strategic Decision-Making Problems
 Nate Kohl and  Risto Miikkulainen, 2009
    Details   
 Reinforcement learning in the brain
 Yael Niv, 2009
    Details   
 Reinforcement Learning in Finite MDPs: PAC Analysis
 Lihong Strehl, Alexander L., Li and  Michael L. Littman, 2009
    Details   
 On the role of tracking in stationary environments
 Richard S. Sutton,  Anna Koop, and  David Silver, 2007
    Details   
 Transfer Learning via Inter-Task Mappings for Temporal Difference Learning
 Matthew E. Taylor,  Peter Stone, and  Yaxin Liu, 2007
    Details   
 PAC model-free reinforcement learning
 Alexander L. Strehl,  Lihong Li,  Eric Wiewiora,  John Langford, and  Michael L. Littman, 2006
    Details   
 Why (PO)MDPs Lose for Spatial Tasks and What to Do About It
 Terran Lane and  William D. Smart, 2005
    Details   
 A theoretical analysis of Model-Based Interval Estimation
 Alexander L. Strehl and  Michael L. Littman, 2005
    Details   
 Temporal difference models describe higher-order learning in humans
 Ben Seymour,  John P. O'Doherty,  Peter Dayan,  Martin Koltzenburg,  Anthony K. Jones,  Raymond J. Dolan,  Karl J. Friston, and  Richard S. Frackowiak, 2004
    Details   
 R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning
 Ronen I. Brafman and  Moshe Tennenholtz, 2003
    Details   
 Near-Optimal Reinforcement Learning in Polynomial Time
 Michael Kearns and  Satinder Singh, 2002
    Details   
 Eligibility Traces for Off-Policy Policy Evaluation
 Doina Precup,  Richard S. Sutton, and  Satinder P. Singh, 2000
    Details   
 Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
 Satinder Singh,  Tommi Jaakkola,  Michael L. Littman, and  Csaba Szepesvári, 2000
    Details   
 Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
 Andrew Y. Ng,  Daishi Harada, and  Stuart J. Russell, 1999
    Details   
 Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
 Jette Randløv and  Preben Alstrøm, 1998
    Details   
 Reinforcement Learning: An Introduction
 Richard S. Sutton and  Andrew G. Barto, 1998
    Details   
 No free lunch theorems for optimization
 David H. Wolpert and  William G. Macready, 1997
    Details   
 Neuro-Dynamic Programming
 Dimitri P. Bertsekas and  John N. Tsitsiklis, 1996
    Details   
 On the Complexity of Solving Markov Decision Problems
 Michael L. Littman,  Thomas L. Dean, and  Leslie Pack Kaelbling, 1995
    Details   
 Markov Decision Processes
 Martin L. Puterman, 1994
    Details   
 On-line Q-learning using connectionist systems
 G. A. Rummery and  M. Niranjan, 1994
    Details   
 On bias and step size in temporal-difference learning
 Richard S. Sutton and  Satinder P. Singh, 1994
    Details   
 Practical Issues in Temporal Difference Learning
 Gerald Tesauro, 1992
    Details   
 Q-Learning
 Christopher J. C. H. Watkins and  Peter Dayan, 1992
    Details   
 Learning to Predict By the Methods of Temporal Differences
 Richard S. Sutton, 1988
    Details   
 Dynamic Programming
 Richard Bellman, 1957
    Details   
 Some aspects of the sequential design of experiments
 Herbert Robbins, 1952
    Details