• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning.
Shimon
Whiteson, Brian Tanner, Matthew
E. Taylor, and Peter Stone.
In IEEE Symposium on Adaptive Dynamic
Programming and Reinforcement Learning (ADPRL), April 2011.
2011
IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)
Empirical evaluations play an important role in machine learning. However, the usefulness of any evaluation depends on the empirical methodology employed. Designing good empirical methodologies is difficult in part because agents can overfit test evaluations and thereby obtain misleadingly high scores. We argue that reinforcement learning is particularly vulnerable to environment overfitting and propose as a remedy generalized methodologies, in which evaluations are based on multiple environments sampled from a distribution. In addition, we consider how to summarize performance when scores from different environments may not have commensurate values. Finally, we present proof-of-concept results demonstrating how these methodologies can validate an intuitively useful range-adaptive tile coding method.
@InProceedings{ADPRL11-shimon,
author="Shimon Whiteson and Brian Tanner and Matthew E.\ Taylor and Peter Stone",
title="Protecting Against Evaluation Overfitting in Empirical Reinforcement Learning",
booktitle="{IEEE} Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)",
month="April",
year="2011",
abstract={
Empirical evaluations play an important role in
machine learning. However, the usefulness of any
evaluation depends on the \emph{empirical
methodology} employed. Designing good empirical
methodologies is difficult in part because agents
can \emph{overfit} test evaluations and thereby
obtain misleadingly high scores. We argue that
reinforcement learning is particularly vulnerable to
\emph{environment overfitting} and propose as a
remedy \emph{generalized methodologies}, in which
evaluations are based on multiple environments
sampled from a distribution. In addition, we
consider how to summarize performance when scores
from different environments may not have
commensurate values. Finally, we present
proof-of-concept results demonstrating how these
methodologies can validate an intuitively useful
range-adaptive tile coding method.
},
wwwnote={<a href="http://www.ieee-ssci.org/2011/adprl-2011">2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)</a>},
}
Generated by bib2html.pl (written by Patrick Riley ) on Sat Nov 15, 2025 21:30:21