@COMMENT This file was generated by bib2html.pl <http://www.cs.cmu.edu/~pfr/misc_software/index.html#bib2html> version 0.90
@COMMENT written by Patrick Riley <http://www.cs.cmu.edu/~pfr>
@COMMENT This file came from Peter Stone's publication pages at
@COMMENT http://www.cs.utexas.edu/~pstone/papers
@Article{AB07,
	Author="Shimon Whiteson and Matthew E.\ Taylor and Peter Stone",
	title="Empirical Studies in Action Selection for Reinforcement Learning",
        journal="Adaptive Behavior",
	year="2007",
	volume="15",number="1",
	month="March",
	pages="33--50",
	abstract=" 
                  To excel in challenging tasks, intelligent agents
                  need sophisticated mechanisms for action selection:
                  they need policies that dictate what action to take
                  in each situation.  Reinforcement learning (RL)
                  algorithms are designed to learn such policies given
                  only positive and negative rewards.  Two contrasting
                  approaches to RL that are currently in popular use
                  are temporal difference (TD) methods, which learn
                  value functions, and evolutionary methods, which
                  optimize populations of candidate policies.  Both
                  approaches have had practical successes but few
                  studies have directly compared them.  Hence, there
                  are no general guidelines describing their relative
                  strengths and weaknesses.  In addition, there has
                  been little cross-collaboration, with few attempts
                  to make them work together or to apply ideas from
                  one to the other.  This article aims to address
                  these shortcomings via three empirical studies that
                  compare these methods and investigate new ways of
                  making them work together.
                  First, we compare the two approaches in a benchmark
                  task and identify variations of the task that
                  isolate factors critical to each method's
                  performance.  Second, we investigate ways to make
                  evolutionary algorithms excel at on-line tasks by
                  borrowing exploratory mechanisms traditionally used
                  by TD methods.  We present empirical results
                  demonstrating a dramatic performance improvement.
                  Third, we explore a novel way of making evolutionary
                  and TD methods work together by using evolution to
                  automatically discover good representations for TD
                  function approximators.  We present results
                  demonstrating that this novel approach can
                  outperform both TD and evolutionary methods alone.
	",	 
}