• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
Composing Efficient, Robust Tests for Policy Selection.
Dustin Morrill, Thomas J. Walsh, Daniel Hernandez,
Peter R. Wurman, and Peter Stone.
In
The 39th Conference on Uncertainty in Artificial Intelligence (UAI), August 2023.
short
video presentation, poster
Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations. RPOSST treats the test case selection problem as a two-player game and optimizes a solution with provable $k$-of-$N$ robustness, bounding the error relative to a test that used all the test cases in the pool. Empirical results demonstrate that RPOSST finds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fidelity racing simulator.
@InProceedings{UAI23,
author={Dustin Morrill and Thomas J.\ Walsh and Daniel Hernandez and Peter R.\ Wurman and Peter Stone},
title={Composing Efficient, Robust Tests for Policy Selection},
BookTitle={The 39th Conference on Uncertainty in Artificial Intelligence (UAI)},
location={Pittsburgh, PA, USA},
month={August},
year={2023},
abstract={
Modern reinforcement learning systems produce many
high-quality policies throughout the learning
process. However, to choose which policy to actually deploy
in the real world, they must be tested under an intractable
number of environmental conditions. We introduce RPOSST, an
algorithm to select a small set of test cases from a larger
pool based on a relatively small number of sample
evaluations. RPOSST treats the test case selection problem
as a two-player game and optimizes a solution with provable
$k$-of-$N$ robustness, bounding the error relative to a test
that used all the test cases in the pool. Empirical results
demonstrate that RPOSST finds a small set of test cases that
identify high quality policies in a toy one-shot game, poker
datasets, and a high-fidelity racing simulator.
},
wwwnote={<a href="https://youtu.be/XkC9QR3Dil8">short video presentation</a>, <a href="https://www.auai.org/uai2023/posters/257.pdf">poster</a>},
}
Generated by bib2html.pl (written by Patrick Riley ) on Sat Nov 01, 2025 23:24:56