This paper introduces Learn Structure and Exploit RMax (LSE-RMax), a novel model based structure learning algorithm for ergodic factored-state MDPs. Given a planning horizon that satisfies a condition, LSE-RMax provably guarantees a return very close to the optimal return, with a high certainty, without requiring any prior knowledge of the in-degree of the transition function as input. LSE-RMax is fully implemented with a thorough analysis of its sample complexity. We also present empirical results demonstrating its effectiveness compared to prior approaches to the problem.
View:
PDF, PS, HTML
Citation:
In Proceedings of the Twenty Eighth International Conference on Machine Learning (ICML'11), June 2011.
Bibtex:

Doran Chakraborty Ph.D. Alumni chakrado [at] cs utexas edu
Peter Stone Faculty pstone [at] cs utexas edu