UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Markov Decision Processes
The Markov Decision Process (MDP) is the formalism underlying modern value-function based reinforcement learning.
People
Elad Liebman
Ph.D. Student
eladlieb [at] cs utexas edu
Jacob Menashe
Ph.D. Student
jmenashe [at] cs utexas edu
Sanmit Narvekar
Ph.D. Student
sanmit [at] cs utexas edu
Peter Stone
Faculty
pstone [at] cs utexas edu
Publications
[Expand to show all 16]
[Minimize]
On Sampling Error in Batch Action-Value Prediction Algorithms
2020
Brahma S. Pavse, Josiah P. Hanna, Ishan Durugkar, and Peter Stone, In
In the Offline Reinforcement Learning Workshop at Neural Information Processing Systems (NeurIPS), December 2020.
, Remote (Virtual Conference), December 2020.
Reducing Sampling Error in Batch Temporal Difference Learning
2020
Brahma Pavse, Ishan Durugkar, Josiah Hanna, and Peter Stone, In
Proceedings of the 37th International Conference on Machine Learning (ICML)
, Vienna, Austria (Virtual Conference), July 2020.
Building Self-Play Curricula Online by Playing with Expert Agents in Adversarial Games
2019
Felipe Leno Da Silva, Anna Helena Reali Costa, and Peter Stone, In
Proceedings of the 8th Brazilian Conference on Intelligent Systems (BRACIS)
, Salvador, Bahia, Brazil, October 2019.
On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search
2016
Khandelwal, Piyush, Liebman, Elad, Niekum, Scott, Stone, and Peter, In
Proceedings of The 33rd International Conference on Machine Learning
, pp. 1319--1328, New York City, NY, USA, June 2016.
Autonomous Trading in Modern Electricity Markets
2015
Daniel Urieli, PhD Thesis, Department of Computer Sciences, The University of Texas at Austin. Code and binaries available at: http://www.cs.utexas.edu/~urieli/thesis.
Deep Recurrent Q-Learning for Partially Observable MDPs
2015
Matthew Hausknecht and Peter Stone, In
AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (AAAI-SDMIA15)
, Arlington, Virginia, USA, November 2015.
Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance
2015
W. Bradley Knox and Peter Stone,
Artificial Intelligence
, Vol. 225 (2015).
Leading the Way: An Efficient Multi-robot Guidance System
2015
Piyush Khandelwal, Samuel Barrett, and Peter Stone, In
International Conference on Autonomous Agents and Multiagent Systems (AAMAS)
, Istanbul, Turkey, May 2015.
Learning from Human-Generated Reward
2012
W. Bradley Knox, No other information
TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains.
2012
Todd Hester, PhD Thesis, The University of Texas at Austin. Code available at: http://www.ros.org/wiki/rl-texplore-ros-pkg.
The Nature of Belief-Directed Exploratory Choice in Human Decision-Making
2012
W. Bradley Knox , A. Ross Otto , Peter Stone , and Bradley Love,
Frontiers in Psychology
, Vol. 2 (2012). The paper can be accessed at: http://www.frontiersin.org/Journal/Abstract.aspx?s=196&name=cognitive_science&ART_DOI=10.3389/fpsyg.2011.00398.
Learning Complementary Multiagent Behaviors: A Case Study
2009
Shivaram Kalyanakrishnan and Peter Stone, In
Proceedings of the RoboCup International Symposium 2009
2009. Springer Verlag.
Bayesian Models of Nonstationary Markov Decision Problems
2005
Nicholas K. Jong and Peter Stone, In
IJCAI 2005 workshop on Planning and Learning in A Priori Unknown or Dynamic Domains
, August 2005.
Improving Action Selection in MDP's via Knowledge Transfer
2005
Alexander A. Sherstov and Peter Stone, In
Proceedings of the Twentieth National Conference on Artificial Intelligence
, July 2005.
State Abstraction Discovery from Irrelevant State Variables
2005
Nicholas K. Jong and Peter Stone, In
Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence
, pp. 752-757, August 2005.
Towards Learning to Ignore Irrelevant State Variables
2004
Nicholas K. Jong and Peter Stone, In
The AAAI-2004 Workshop on Learning and Planning in Markov Processes -- Advances and Challenges
2004.
Projects
TEXPLORE: Real-Time Sample Efficient Reinforcement Learning
2009 - Present
Demos
TEXPLORE: Real-Time Sample Efficient Reinforcement Learning
Todd Hester
2012
Labs
Learning Agents