@COMMENT This file was generated by bib2html.pl version 0.90
@COMMENT written by Patrick Riley
@COMMENT This file came from Peter Stone's publication pages at
@COMMENT http://www.cs.utexas.edu/~pstone/papers
@Article{AB07,
Author="Shimon Whiteson and Matthew E.\ Taylor and Peter Stone",
title="Empirical Studies in Action Selection for Reinforcement Learning",
journal="Adaptive Behavior",
year="2007",
volume="15",number="1",
month="March",
pages="33--50",
abstract="
To excel in challenging tasks, intelligent agents
need sophisticated mechanisms for action selection:
they need policies that dictate what action to take
in each situation. Reinforcement learning (RL)
algorithms are designed to learn such policies given
only positive and negative rewards. Two contrasting
approaches to RL that are currently in popular use
are temporal difference (TD) methods, which learn
value functions, and evolutionary methods, which
optimize populations of candidate policies. Both
approaches have had practical successes but few
studies have directly compared them. Hence, there
are no general guidelines describing their relative
strengths and weaknesses. In addition, there has
been little cross-collaboration, with few attempts
to make them work together or to apply ideas from
one to the other. This article aims to address
these shortcomings via three empirical studies that
compare these methods and investigate new ways of
making them work together.
First, we compare the two approaches in a benchmark
task and identify variations of the task that
isolate factors critical to each method's
performance. Second, we investigate ways to make
evolutionary algorithms excel at on-line tasks by
borrowing exploratory mechanisms traditionally used
by TD methods. We present empirical results
demonstrating a dramatic performance improvement.
Third, we explore a novel way of making evolutionary
and TD methods work together by using evolution to
automatically discover good representations for TD
function approximators. We present results
demonstrating that this novel approach can
outperform both TD and evolutionary methods alone.
",
}