AI Lab Areas - Markov Decision Processes

Markov Decision Processes

The Markov Decision Process (MDP) is the formalism underlying modern value-function based reinforcement learning.

Elad Liebman	Ph.D. Student	eladlieb [at] cs utexas edu
Jacob Menashe	Ph.D. Student	jmenashe [at] cs utexas edu
Sanmit Narvekar	Ph.D. Student	sanmit [at] cs utexas edu
Peter Stone	Faculty	pstone [at] cs utexas edu

Publications

[Expand to show all 16]

On Sampling Error in Batch Action-Value Prediction Algorithms	2020
Brahma S. Pavse, Josiah P. Hanna, Ishan Durugkar, and Peter Stone, In In the Offline Reinforcement Learning Workshop at Neural Information Processing Systems (NeurIPS), December 2020., Remote (Virtual Conference), December 2020.
Reducing Sampling Error in Batch Temporal Difference Learning	2020
Brahma Pavse, Ishan Durugkar, Josiah Hanna, and Peter Stone, In Proceedings of the 37th International Conference on Machine Learning (ICML), Vienna, Austria (Virtual Conference), July 2020.
Building Self-Play Curricula Online by Playing with Expert Agents in Adversarial Games	2019
Felipe Leno Da Silva, Anna Helena Reali Costa, and Peter Stone, In Proceedings of the 8th Brazilian Conference on Intelligent Systems (BRACIS), Salvador, Bahia, Brazil, October 2019.
On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search	2016
Khandelwal, Piyush, Liebman, Elad, Niekum, Scott, Stone, and Peter, In Proceedings of The 33rd International Conference on Machine Learning, pp. 1319--1328, New York City, NY, USA, June 2016.
Autonomous Trading in Modern Electricity Markets	2015
Daniel Urieli, PhD Thesis, Department of Computer Sciences, The University of Texas at Austin. Code and binaries available at: http://www.cs.utexas.edu/~urieli/thesis.
Deep Recurrent Q-Learning for Partially Observable MDPs	2015
Matthew Hausknecht and Peter Stone, In AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (AAAI-SDMIA15), Arlington, Virginia, USA, November 2015.
Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance	2015
W. Bradley Knox and Peter Stone, Artificial Intelligence, Vol. 225 (2015).
Leading the Way: An Efficient Multi-robot Guidance System	2015
Piyush Khandelwal, Samuel Barrett, and Peter Stone, In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Istanbul, Turkey, May 2015.
Learning from Human-Generated Reward	2012
W. Bradley Knox, No other information
TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains.	2012
Todd Hester, PhD Thesis, The University of Texas at Austin. Code available at: http://www.ros.org/wiki/rl-texplore-ros-pkg.
The Nature of Belief-Directed Exploratory Choice in Human Decision-Making	2012
W. Bradley Knox , A. Ross Otto , Peter Stone , and Bradley Love, Frontiers in Psychology, Vol. 2 (2012). The paper can be accessed at: http://www.frontiersin.org/Journal/Abstract.aspx?s=196&name=cognitive_science&ART_DOI=10.3389/fpsyg.2011.00398.
Learning Complementary Multiagent Behaviors: A Case Study	2009
Shivaram Kalyanakrishnan and Peter Stone, In Proceedings of the RoboCup International Symposium 2009 2009. Springer Verlag.
Bayesian Models of Nonstationary Markov Decision Problems	2005
Nicholas K. Jong and Peter Stone, In IJCAI 2005 workshop on Planning and Learning in A Priori Unknown or Dynamic Domains, August 2005.
Improving Action Selection in MDP's via Knowledge Transfer	2005
Alexander A. Sherstov and Peter Stone, In Proceedings of the Twentieth National Conference on Artificial Intelligence, July 2005.
State Abstraction Discovery from Irrelevant State Variables	2005
Nicholas K. Jong and Peter Stone, In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, pp. 752-757, August 2005.
Towards Learning to Ignore Irrelevant State Variables	2004
Nicholas K. Jong and Peter Stone, In The AAAI-2004 Workshop on Learning and Planning in Markov Processes -- Advances and Challenges 2004.

Projects

TEXPLORE: Real-Time Sample Efficient Reinforcement Learning

2009 - Present

Demos

TEXPLORE: Real-Time Sample Efficient Reinforcement Learning

Todd Hester

2012

Labs

Learning Agents