• Sorted by Date • Classified by Publication Type • Classified by Topic • Sorted by First Author Last Name •
Shimon Whiteson, Matthew E. Taylor, and Peter Stone. Empirical Studies in Action Selection for Reinforcement Learning. Adaptive Behavior, 15(1):33–50, March 2007.
[PDF]828.6kB [postscript]1.5MB
To excel in challenging tasks, intelligent agents need sophisticated mechanisms for action selection: they need policies that dictate what action to take in each situation. Reinforcement learning (RL) algorithms are designed to learn such policies given only positive and negative rewards. Two contrasting approaches to RL that are currently in popular use are temporal difference (TD) methods, which learn value functions, and evolutionary methods, which optimize populations of candidate policies. Both approaches have had practical successes but few studies have directly compared them. Hence, there are no general guidelines describing their relative strengths and weaknesses. In addition, there has been little cross-collaboration, with few attempts to make them work together or to apply ideas from one to the other. This article aims to address these shortcomings via three empirical studies that compare these methods and investigate new ways of making them work together. First, we compare the two approaches in a benchmark task and identify variations of the task that isolate factors critical to each method's performance. Second, we investigate ways to make evolutionary algorithms excel at on-line tasks by borrowing exploratory mechanisms traditionally used by TD methods. We present empirical results demonstrating a dramatic performance improvement. Third, we explore a novel way of making evolutionary and TD methods work together by using evolution to automatically discover good representations for TD function approximators. We present results demonstrating that this novel approach can outperform both TD and evolutionary methods alone.
@Article{AB07,
Author="Shimon Whiteson and Matthew E.\ Taylor and Peter Stone",
title="Empirical Studies in Action Selection for Reinforcement Learning",
journal="Adaptive Behavior",
year="2007",
volume="15",number="1",
month="March",
pages="33--50",
abstract="
To excel in challenging tasks, intelligent agents
need sophisticated mechanisms for action selection:
they need policies that dictate what action to take
in each situation. Reinforcement learning (RL)
algorithms are designed to learn such policies given
only positive and negative rewards. Two contrasting
approaches to RL that are currently in popular use
are temporal difference (TD) methods, which learn
value functions, and evolutionary methods, which
optimize populations of candidate policies. Both
approaches have had practical successes but few
studies have directly compared them. Hence, there
are no general guidelines describing their relative
strengths and weaknesses. In addition, there has
been little cross-collaboration, with few attempts
to make them work together or to apply ideas from
one to the other. This article aims to address
these shortcomings via three empirical studies that
compare these methods and investigate new ways of
making them work together.
First, we compare the two approaches in a benchmark
task and identify variations of the task that
isolate factors critical to each method's
performance. Second, we investigate ways to make
evolutionary algorithms excel at on-line tasks by
borrowing exploratory mechanisms traditionally used
by TD methods. We present empirical results
demonstrating a dramatic performance improvement.
Third, we explore a novel way of making evolutionary
and TD methods work together by using evolution to
automatically discover good representations for TD
function approximators. We present results
demonstrating that this novel approach can
outperform both TD and evolutionary methods alone.
",
}
Generated by bib2html.pl (written by Patrick Riley ) on Thu Oct 23, 2025 16:19:40