Mark Ring, Tom Schaul. Q-error as a Selection Mechanism in Modular
Reinforcement-Learning Systems. Proceedings of the
International Joint Conference on Artificial Intelligence
(IJCAI-2011, Barcelona), 2011.
Abstract
This paper introduces a novel multi-modular method for reinforcement
learning. A multi-modular system is one that partitions the learning
task among a set of experts (modules), where each expert is
incapable of solving the entire task by itself. There are many
advantages to splitting up large tasks in this way, but existing
methods face difficulties when choosing which module(s) should
contribute to the agent's actions at any particular moment. We
introduce a novel selection mechanism where every module, besides
calculating a set of action values, also estimates its own error for
the current input. The selection mechanism combines each module's
estimate of long-term reward and self-error to produce a score by
which the next module is chosen. As a result, the modules can use
their resources effectively and efficiently divide up the task. The
system is shown to learn complex tasks even when the individual
modules use only linear function approximators.