UTCS Colloquium/AI: Richard S. Sutton/University of Alberta From Experience to Reason: Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping ACES 2.402 Tuesday June 24 2008 11:00 a.m.

Contact Name: 
Jenna Whitney
Date: 
Jun 24, 2008 11:00am - 12:00pm

There is a signup schedule for this event (UT EID required).
Type of Talk: UTCS Colloquium/AI

Speaker Name/Affiliation: Richa

rd S. Sutton/University of Alberta

Date/Time: Tuesday June 24 200

8 11:00 a.m.

Location: ACES 2.402

Host: Peter Stone

Talk Title: From Experience to Reason: Dyna-Style Planning with Linear Fun

ction Approximation and Prioritized Sweeping

Talk Abstract:
Under

standing the world representing its state and dynamics
at multiple lev

els of abstraction and being able to use this
knowledge flexibly to ac

hieve goals are key abilities sought in
all approaches to artificial in

telligence. In this talk I approach
them from the point of view of rein

forcement learning which
means an emphasis on learning and action and
on how these
interrelate with planning. In particular I expand on the
idea that
learning and planning can be done simultaneously and by the

same algorithm---operating either on real experience (learning)
or

imagined experience (planning). This idea became popular in
the 1990s

under the name Dyna architecture in part because
it was one of very fe

w planning systems that worked with a learned
model of the world. Howev

er a limitation of past work with the Dyna
architecture was that it use

d a table-lookup form for the world model
(in which every state was tre

ated distinctly without generalization)
which does not scale to large p

roblems. Scaling requires replacing
the tables with parameterized funct

ion approximators. However we
now know that combining reinforcement le

arning with function
approximation can become unstable when trained off

-policy i.e.
with counterfactuals such as are inherent in planning. O

ur main new
result is to establish conditions under which the stability
of Dyna-style
planning can be proved. Given this we can also immediat

ely establish
the soundness of several natural generalizations of prior

itized sweeping
to linear function approximation. Prioritized sweeping

is a search control
method for Dyna that focuses planning effort where i

t has greatest effect
sometimes dramatically increasing planning effici

ency. The resulting
system is probably the most efficient online reinfo

rcement-learning
method known. I conclude by discussing extensions of t

he Dyna idea
to temporally abstract courses of action (options) and rea

soning --
planning about subgoals other than the ultimate.
%5Bthis i

s joint work w/ Csaba Szepesvari Alborz Geramifard &
Michael Bowling%

5D

Speaker Bio:
Richard S. Sutton is a professor and iCORE chair

in the department
of computing science at the University of Alberta. He
is a fellow of the
American Association for Artificial Intelligence and
co-author of the
textbook Reinforcement Learning: An Introduction from
MIT Press.
Before joining the University of Alberta in 2003 he worked
in industry
at AT&T and GTE Labs and in academia at the University of
Mass.
He received a PhD in computer science from the University of Mas

s.
in 1984 and a BA in psychology from Stanford University in 1978.
Rich''s research interests center on the learning problems facing a
dec

ision-maker interacting with its environment which he sees as
central

to artificial intelligence. He is also interested in animal learning
psy

chology in connectionist networks and generally in systems that
contin

ually improve their representations and models of the world.