Markov Decision Processes
The Markov Decision Process (MDP) is the formalism underlying modern value-function based reinforcement learning.
Craig Corcoran Ph.D. Student ccor [at] cs utexas edu
Elad Liebman Ph.D. Student eladlieb [at] cs utexas edu
Peter Stone Faculty pstone [at] cs utexas edu
Learning from Human-Generated Reward 2012
W. Bradley Knox,
TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. 2012
Todd Hester, PhD Thesis, The University of Texas at Austin. Code available at:
The Nature of Belief-Directed Exploratory Choice in Human Decision-Making 2012
W. Bradley Knox , A. Ross Otto , Peter Stone , and Bradley Love, Frontiers in Psychology, Vol. 2 (2012). The paper can be accessed at:
Learning Complementary Multiagent Behaviors: A Case Study 2009
Shivaram Kalyanakrishnan and Peter Stone, In Proceedings of the RoboCup International Symposium 2009 2009. Springer Verlag.
Bayesian Models of Nonstationary Markov Decision Problems 2005
Nicholas K. Jong and Peter Stone, In IJCAI 2005 workshop on Planning and Learning in A Priori Unknown or Dynamic Domains, August 2005.
Improving Action Selection in MDP's via Knowledge Transfer 2005
Alexander A. Sherstov and Peter Stone, In Proceedings of the Twentieth National Conference on Artificial Intelligence, July 2005.
State Abstraction Discovery from Irrelevant State Variables 2005
Nicholas K. Jong and Peter Stone, In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, pp. 752-757, August 2005.
Towards Learning to Ignore Irrelevant State Variables 2004
Nicholas K. Jong and Peter Stone, In The AAAI-2004 Workshop on Learning and Planning in Markov Processes -- Advances and Challenges 2004.