Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Deep Recurrent Q-Learning for Partially Observable MDPs

Matthew Hausknecht and Peter Stone. Deep Recurrent Q-Learning for Partially Observable MDPs. In AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (AAAI-SDMIA15), November 2015.

Download

[PDF]1.5MB  [slides.pdf]3.8MB  

Abstract

Deep Reinforcement Learning has yielded proficient controllers forcomplex tasks. However, these controllers have limited memory and relyon being able to perceive the complete game screen at each decisionpoint. To address these shortcomings, this article investigates theeffects of adding recurrency to a Deep Q-Network (DQN) by replacingthe first post-convolutional fully-connected layer with a recurrentLSTM. The resulting Deep Recurrent Q-Network (DRQN), althoughcapable of seeing only a single frame at each timestep, successfullyintegrates information through time and replicates DQN's performanceon standard Atari games and partially observed equivalents featuringflickering game screens. Additionally, when trained with partialobservations and evaluated with incrementally more completeobservations, DRQN's performance scales as a function ofobservability. Conversely, when trained with full observations andevaluated with partial observations, DRQN's performance degrades lessthan DQN's. Thus, given the same length of history, recurrency is aviable alternative to stacking a history of frames in the DQN's inputlayer and while recurrency confers no systematic advantage when learningto play the game, the recurrent net can better adapt at evaluationtime if the quality of observations changes.

BibTeX Entry

@InProceedings{SDMIA15-Hausknecht,
  author = {Matthew Hausknecht and Peter Stone},
  title = {Deep Recurrent Q-Learning for Partially Observable MDPs},
  booktitle = {AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents (AAAI-SDMIA15)},
  location = {Arlington, Virginia, USA},
  month = {November},
  year = {2015},
  abstract={
Deep Reinforcement Learning has yielded proficient controllers for
complex tasks. However, these controllers have limited memory and rely
on being able to perceive the complete game screen at each decision
point. To address these shortcomings, this article investigates the
effects of adding recurrency to a Deep Q-Network (DQN) by replacing
the first post-convolutional fully-connected layer with a recurrent
LSTM. The resulting Deep Recurrent Q-Network (DRQN), although
capable of seeing only a single frame at each timestep, successfully
integrates information through time and replicates DQN's performance
on standard Atari games and partially observed equivalents featuring
flickering game screens. Additionally, when trained with partial
observations and evaluated with incrementally more complete
observations, DRQN's performance scales as a function of
observability. Conversely, when trained with full observations and
evaluated with partial observations, DRQN's performance degrades less
than DQN's. Thus, given the same length of history, recurrency is a
viable alternative to stacking a history of frames in the DQN's input
layer and while recurrency confers no systematic advantage when learning
to play the game, the recurrent net can better adapt at evaluation
time if the quality of observations changes.
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Mon Jul 06, 2020 17:00:36