Generalized Domains for Empirical Evaluations in Reinforcement Learning (2009)
Many empirical results in reinforcement learning are based on a very small set of environments. These results often represent the best algorithm parameters that were found after an ad-hoc tuning or fitting process. We argue that presenting tuned scores from a small set of environments leads to method overfitting, wherein results may not generalize to similar environments. To address this problem, we advocate empirical evaluations using generalized domains: parameterized problem generators that explicitly encode variations in the environment to which the learner should be robust. We argue that evaluating across a set of these generated problems offers a more meaningful evaluation of reinforcement learning algorithms.
In ICML Workshop on Evaluation Methods for Machine Learning, June 2009. To appear..

Peter Stone Faculty pstone [at] cs utexas edu
Matthew Taylor Ph.D. Alumni taylorm [at] eecs wsu edu
Shimon Whiteson Ph.D. Alumni s a whiteson [at] uva nl