Markov Decision Processes
The Markov Decision Process (MDP) is the formalism underlying modern value-function based reinforcement learning.
Elad Liebman Ph.D. Student eladlieb [at] cs utexas edu
Jacob Menashe Ph.D. Student jmenashe [at] cs utexas edu
Sanmit Narvekar Ph.D. Student sanmit [at] cs utexas edu
Peter Stone Faculty pstone [at] cs utexas edu
     [Expand to show all 16][Minimize]
On Sampling Error in Batch Action-Value Prediction Algorithms 2020
Brahma S. Pavse, Josiah P. Hanna, Ishan Durugkar, and Peter Stone, In In the Offline Reinforcement Learning Workshop at Neural Information Processing Systems (NeurIPS), December 2020., Remote (Virtual Conference), December 2020.
Reducing Sampling Error in Batch Temporal Difference Learning 2020
Brahma Pavse, Ishan Durugkar, Josiah Hanna, and Peter Stone, In Proceedings of the 37th International Conference on Machine Learning (ICML), Vienna, Austria (Virtual Conference), July 2020.
Building Self-Play Curricula Online by Playing with Expert Agents in Adversarial Games 2019
Felipe Leno Da Silva, Anna Helena Reali Costa, and Peter Stone, In Proceedings of the 8th Brazilian Conference on Intelligent Systems (BRACIS), Salvador, Bahia, Brazil, October 2019.
On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search 2016
Khandelwal, Piyush, Liebman, Elad, Niekum, Scott, Stone, and Peter, In Proceedings of The 33rd International Conference on Machine Learning, pp. 1319--1328, New York City, NY, USA, June 2016.
Autonomous Trading in Modern Electricity Markets 2015
Daniel Urieli, PhD Thesis, Department of Computer Sciences, The University of Texas at Austin. Code and binaries available at:
Deep Recurrent Q-Learning for Partially Observable MDPs 2015
Matthew Hausknecht and Peter Stone, In AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (AAAI-SDMIA15), Arlington, Virginia, USA, November 2015.
Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance 2015
W. Bradley Knox and Peter Stone, Artificial Intelligence, Vol. 225 (2015).
Leading the Way: An Efficient Multi-robot Guidance System 2015
Piyush Khandelwal, Samuel Barrett, and Peter Stone, In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Istanbul, Turkey, May 2015.
Learning from Human-Generated Reward 2012
W. Bradley Knox, No other information
TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. 2012
Todd Hester, PhD Thesis, The University of Texas at Austin. Code available at:
The Nature of Belief-Directed Exploratory Choice in Human Decision-Making 2012
W. Bradley Knox , A. Ross Otto , Peter Stone , and Bradley Love, Frontiers in Psychology, Vol. 2 (2012). The paper can be accessed at:
Learning Complementary Multiagent Behaviors: A Case Study 2009
Shivaram Kalyanakrishnan and Peter Stone, In Proceedings of the RoboCup International Symposium 2009 2009. Springer Verlag.
Bayesian Models of Nonstationary Markov Decision Problems 2005
Nicholas K. Jong and Peter Stone, In IJCAI 2005 workshop on Planning and Learning in A Priori Unknown or Dynamic Domains, August 2005.
Improving Action Selection in MDP's via Knowledge Transfer 2005
Alexander A. Sherstov and Peter Stone, In Proceedings of the Twentieth National Conference on Artificial Intelligence, July 2005.
State Abstraction Discovery from Irrelevant State Variables 2005
Nicholas K. Jong and Peter Stone, In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, pp. 752-757, August 2005.
Towards Learning to Ignore Irrelevant State Variables 2004
Nicholas K. Jong and Peter Stone, In The AAAI-2004 Workshop on Learning and Planning in Markov Processes -- Advances and Challenges 2004.