Shivaram's Reading List

Shivaram's Reading List

Function Approximation			Partial Observability			Learning Methods			Ensembles
Stochastic Optimisation			General RL			General ML			Multiagent Learning
Comparison/Integration			Bandits			Applications			Robot Soccer
Humanoids			Parameter			MDP			Empirical
Failure Warning			Representation			General AI			Neural Networks
All

General RL

Learning Methods for Sequential Decision Making with Imperfect Representations
Shivaram Kalyanakrishnan, 2011
Details

Near-optimal Regret Bounds for Reinforcement Learning
Thomas Jaksch, Ronald Ortner, and Peter Auer, 2010
Details

Algorithms for Reinforcement Learning
Csaba Szepesvári, 2010
Details

Evolving Neural Networks for Strategic Decision-Making Problems
Nate Kohl and Risto Miikkulainen, 2009
Details

Reinforcement learning in the brain
Yael Niv, 2009
Details

Reinforcement Learning in Finite MDPs: PAC Analysis
Lihong Strehl, Alexander L., Li and Michael L. Littman, 2009
Details

On the role of tracking in stationary environments
Richard S. Sutton, Anna Koop, and David Silver, 2007
Details

Transfer Learning via Inter-Task Mappings for Temporal Difference Learning
Matthew E. Taylor, Peter Stone, and Yaxin Liu, 2007
Details

PAC model-free reinforcement learning
Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman, 2006
Details

Why (PO)MDPs Lose for Spatial Tasks and What to Do About It
Terran Lane and William D. Smart, 2005
Details

A theoretical analysis of Model-Based Interval Estimation
Alexander L. Strehl and Michael L. Littman, 2005
Details

Temporal difference models describe higher-order learning in humans
Ben Seymour, John P. O'Doherty, Peter Dayan, Martin Koltzenburg, Anthony K. Jones, Raymond J. Dolan, Karl J. Friston, and Richard S. Frackowiak, 2004
Details

R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning
Ronen I. Brafman and Moshe Tennenholtz, 2003
Details

Near-Optimal Reinforcement Learning in Polynomial Time
Michael Kearns and Satinder Singh, 2002
Details

Eligibility Traces for Off-Policy Policy Evaluation
Doina Precup, Richard S. Sutton, and Satinder P. Singh, 2000
Details

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Satinder Singh, Tommi Jaakkola, Michael L. Littman, and Csaba Szepesvári, 2000
Details

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
Andrew Y. Ng, Daishi Harada, and Stuart J. Russell, 1999
Details

Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
Jette Randløv and Preben Alstrøm, 1998
Details

Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto, 1998
Details

No free lunch theorems for optimization
David H. Wolpert and William G. Macready, 1997
Details

Neuro-Dynamic Programming
Dimitri P. Bertsekas and John N. Tsitsiklis, 1996
Details

On the Complexity of Solving Markov Decision Problems
Michael L. Littman, Thomas L. Dean, and Leslie Pack Kaelbling, 1995
Details

Markov Decision Processes
Martin L. Puterman, 1994
Details

On-line Q-learning using connectionist systems
G. A. Rummery and M. Niranjan, 1994
Details

On bias and step size in temporal-difference learning
Richard S. Sutton and Satinder P. Singh, 1994
Details

Practical Issues in Temporal Difference Learning
Gerald Tesauro, 1992
Details

Q-Learning
Christopher J. C. H. Watkins and Peter Dayan, 1992
Details

Learning to Predict By the Methods of Temporal Differences
Richard S. Sutton, 1988
Details

Dynamic Programming
Richard Bellman, 1957
Details

Some aspects of the sequential design of experiments
Herbert Robbins, 1952
Details