Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Batch Reinforcement Learning in a Complex Domain

Shivaram Kalyanakrishnan and Peter Stone. Batch Reinforcement Learning in a Complex Domain. In The Sixth International Joint Conference on Autonomous Agents and Multiagent Systems, May 2007.
AAMAS-2007

Download

[PDF]185.0kB  [postscript]392.7kB  

Abstract

Temporal difference reinforcement learning algorithms are perfectly suited to autonomous agents because they learn directly from an agent's experience based on sequential actions in the environment. However, their most common algorithmic variants are relatively inefficient in their use of experience data, which in many agent-based settings can be scarce. In particular, they make just one learning ``update'' for each atomic experience. Batch reinforcement learning algorithms, on the other hand, aim to achieve greater data efficiency by saving experience data and using it in aggregate to make updates to the learned policy. Their success has been demonstrated in the past on simple domains like grid worlds and low-dimensional control applications like pole balancing. In this paper, we compare and contrast batch reinforcement learning algorithms with on-line algorithms based on their empirical performance in a complex, continuous, noisy, multiagent domain, namely RoboCup soccer Keepaway. We find that the two batch methods we consider, Experience Replay and Fitted Q Iteration, both yield significant gains in sample complexity, while achieving high asymptotic performance.

BibTeX Entry

@InProceedings{AAMAS07-kalyanakrishnan,
        author="Shivaram Kalyanakrishnan and Peter Stone",
        title="Batch Reinforcement Learning in a Complex Domain",
        booktitle="The Sixth International Joint Conference on Autonomous Agents and  Multiagent Systems",
        month="May",year="2007", 
        abstract={Temporal difference reinforcement learning
        algorithms are perfectly suited to autonomous agents because
        they learn directly from an agent's experience based on
        sequential actions in the environment. However, their most
        common algorithmic variants are relatively inefficient in
        their use of experience data, which in many agent-based
        settings can be scarce.  In particular, they make just one
        learning ``update'' for each atomic experience.  Batch
        reinforcement learning algorithms, on the other hand, aim to
        achieve greater data efficiency by saving experience data and
        using it in aggregate to make updates to the learned
        policy. Their success has been demonstrated in the past on
        simple domains like grid worlds and low-dimensional control
        applications like pole balancing. In this paper, we compare
        and contrast batch reinforcement learning algorithms with
        on-line algorithms based on their empirical performance in a
        complex, continuous, noisy, multiagent domain, namely RoboCup
        soccer Keepaway. We find that the two batch methods we
        consider, Experience Replay and Fitted Q Iteration, both yield
        significant gains in sample complexity, while achieving high
        asymptotic performance.},
       wwwnote={<a href="http://www.aamas2007.nl/">AAMAS-2007</a>},
}       

Generated by bib2html.pl (written by Patrick Riley ) on Tue Nov 04, 2008 10:18:48