Shivaram's Reading List


Function Approximation     Partial Observability     Learning Methods     Ensembles    
Stochastic Optimisation     General RL     General ML     Multiagent Learning    
Comparison/Integration     Bandits     Applications     Robot Soccer    
Humanoids     Parameter     MDP     Empirical    
Failure Warning     Representation     General AI     Neural Networks    
All    

General RL

Learning Methods for Sequential Decision Making with Imperfect Representations
Shivaram Kalyanakrishnan, 2011
Details   

Near-optimal Regret Bounds for Reinforcement Learning
Thomas Jaksch, Ronald Ortner, and Peter Auer, 2010
Details   

Algorithms for Reinforcement Learning
Csaba Szepesvári, 2010
Details   

Evolving Neural Networks for Strategic Decision-Making Problems
Nate Kohl and Risto Miikkulainen, 2009
Details   

Reinforcement learning in the brain
Yael Niv, 2009
Details   

Reinforcement Learning in Finite MDPs: PAC Analysis
Lihong Strehl, Alexander L., Li and Michael L. Littman, 2009
Details   

On the role of tracking in stationary environments
Richard S. Sutton, Anna Koop, and David Silver, 2007
Details   

Transfer Learning via Inter-Task Mappings for Temporal Difference Learning
Matthew E. Taylor, Peter Stone, and Yaxin Liu, 2007
Details   

PAC model-free reinforcement learning
Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman, 2006
Details   

Why (PO)MDPs Lose for Spatial Tasks and What to Do About It
Terran Lane and William D. Smart, 2005
Details   

A theoretical analysis of Model-Based Interval Estimation
Alexander L. Strehl and Michael L. Littman, 2005
Details   

Temporal difference models describe higher-order learning in humans
Ben Seymour, John P. O'Doherty, Peter Dayan, Martin Koltzenburg, Anthony K. Jones, Raymond J. Dolan, Karl J. Friston, and Richard S. Frackowiak, 2004
Details   

R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning
Ronen I. Brafman and Moshe Tennenholtz, 2003
Details   

Near-Optimal Reinforcement Learning in Polynomial Time
Michael Kearns and Satinder Singh, 2002
Details   

Eligibility Traces for Off-Policy Policy Evaluation
Doina Precup, Richard S. Sutton, and Satinder P. Singh, 2000
Details   

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Satinder Singh, Tommi Jaakkola, Michael L. Littman, and Csaba Szepesvári, 2000
Details   

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
Andrew Y. Ng, Daishi Harada, and Stuart J. Russell, 1999
Details   

Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
Jette Randløv and Preben Alstrøm, 1998
Details   

Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto, 1998
Details   

No free lunch theorems for optimization
David H. Wolpert and William G. Macready, 1997
Details   

Neuro-Dynamic Programming
Dimitri P. Bertsekas and John N. Tsitsiklis, 1996
Details   

On the Complexity of Solving Markov Decision Problems
Michael L. Littman, Thomas L. Dean, and Leslie Pack Kaelbling, 1995
Details   

Markov Decision Processes
Martin L. Puterman, 1994
Details   

On-line Q-learning using connectionist systems
G. A. Rummery and M. Niranjan, 1994
Details   

On bias and step size in temporal-difference learning
Richard S. Sutton and Satinder P. Singh, 1994
Details   

Practical Issues in Temporal Difference Learning
Gerald Tesauro, 1992
Details   

Q-Learning
Christopher J. C. H. Watkins and Peter Dayan, 1992
Details   

Learning to Predict By the Methods of Temporal Differences
Richard S. Sutton, 1988
Details   

Dynamic Programming
Richard Bellman, 1957
Details   

Some aspects of the sequential design of experiments
Herbert Robbins, 1952
Details