CS394R: Reinforcement Learning: Theory and Practice -- Spring 2011
CS394R: Reinforcement Learning: Theory and Practice -- Spring 2011
Instructor: Peter Stone
Department of Computer Science
Tuesday, Thursday 12:30-1:45pm
(Note: the move to a larger room will allow students on the wait list
Jump to the assignments page.
Jump to the resourcespage.
Jump to the discussions page.
Jump to the textbook page.
Jump to the on-line version of the textbook.
Jump to the sample responses page.
Please complete the midterm course evaluation survey.
office hours: Thursdays from 11am-noon (please let me know in advance if you're coming) or by appointment
office: CSA 1.140
office hours: by appointment
office: ENS 32NEA
"The idea that we learn by interacting with our environment is probably
the first to occur to us when we think about the nature of
learning. When an infant plays, waves its arms, or looks about, it has
no explicit teacher, but it does have a direct sensori-motor
connection to its environment. Exercising this connection produces a
wealth of information about cause and effect, about the consequences
of actions, and about what to do in order to achieve goals. Throughout
our lives, such interactions are undoubtedly a major source of
knowledge about our environment and ourselves. Whether we are learning
to drive a car or to hold a conversation, we are all acutely aware of
how our environment responds to what we do, and we seek to influence
what happens through our behavior. Learning from interaction is a
foundational idea underlying nearly all theories of learning and
"Reinforcement learning is learning what to do---how to map situations
to actions---so as to maximize a numerical reward signal. The learner
is not told which actions to take, as in most forms of machine
learning, but instead must discover which actions yield the most
reward by trying them. In the most interesting and challenging cases,
actions may affect not only the immediate reward, but also the next
situation and, through that, all subsequent rewards. These two
characteristics---trial-and-error search and delayed reward---are the
two most important distinguishing features of reinforcement learning."
These two paragraphs from chapter 1 of the course textbook describe
the topic of this course. The course is an informal graduate seminar.
There will be some assigned readings and discussions. The exact
content of the course will be guided in part by the interests of the
students. It will cover at least the first 9 chapters of the course
textbook. Beyond that, we will either continue with the text or move
to more advanced and/or recent readings from the field with an aim
towards focussing on the practical successes and challenges relating
to reinforcement learning.
There will be a programming component to the course in the form of a
final project. Students will be expected to be proficient programmers.
Some background in artificial intelligence and strong programming
skills are recommended.
The course textbook is:
Reinforcement Learning: An Introduction.
Richard S. Sutton and
Andrew G. Barto.
MIT Press, Cambridge, MA, 1998.
Note that the book is available
on-line, though if you take the course, it's probably a book you'll want for your bookshelf. If so please buy it through the GRACS site.
Reading, written, and programming assignments will be updated on the
assignments page. A tentative schedule for the entire semester is posted. But the readings and exercises may change up until the Tuesday of the week before they are due (1 week in advance).
Slides from class and other relevant links and information are on the
resources page. If you find
something that should be added there, please email it to me.
Please subscribe to
the class mailing list. The listname is "cs394r-spr11".
Once you have subscribed to the list, you can send mail to the class
information may be sent to this list. It is the student's
responsibility to be subscribed.
Grades will be based on:
- Written responses to the readings (15%):
By 9pm on the night before a class with a new reading assignment
due, everyone must submit a
brief question or comment about the readings as an email in plain
ascii text. Please send it in the body of the email,
rather than as an attachment. Please use the subject line "class
readings for [due date]". In some cases,
specific questions may be posted along with the readings. But in
general, it is free form. Credit will be based on evidence that you
have done the readings carefully. Acceptable responses include (but
are not limited to):
- Class participation (15%):
Students are expected to be present in class having completed the
readings and participate actively in the discussions.
- Oral presentation/discussion moderation (10%):
Each student will be expected to lead a discussion on one of the
readings. The discussion can begin with a brief summary/overview of
the important points in the readings, but the assumption is to be that
everyone has already completed the readings. The student may either
present material related to the readings (perhaps from an outside
source) or moderate a class discussion about the readings. In the
latter case, the student must be prepared to keep the conversation
flowing. Here are some tips on leading
a discussion. If you would like feedback on your discussion
topic, please contact Peter and Doran (pstone@cs and chakrado@cs) by
9pm two nights before the discussion (Sunday or Tuesday). The
discussions page lists the schedule
for student presentations/discussion moderation.
- Preliminary programming exercises (4) (20%):
Each student will be required to complete four minor programming
assignments of his/her own choosing. In most cases these will come
from the exercises, though other options are possible upon
consultation with the instructor. These exercises need not involve
extensive or elaborate programs. The emphasis is to be on empirically
analyzing various learning algorithms and reporting on the results.
The reports should be emailed to the instructor and TA and all
relevant code and data should be submitted, preferably as instructed below. Each student may choose when to complete
these exercises and on what topic. However at least three must be
completed during the first half of the semester. It is recommended
that the other be completed in conjunction with the student's oral
presentation/discussion moderation. Upon completion, please place
a copy of any relevant source code and data in a directory under
Grading criteria for programming assignments:
8 and 8.5 - Good job, but there is room for improvement
9 and 9.5 - Good analysis, results well presented
10 - Excellent, with interesting research issues identified. Doing more than what has been asked.
- Final programming project (40%):
A more extensive final programming project, along with written report,
will be due on the last day of class. Students will be expected to
agree with the instructor on the topic of the project by about halfway
through the semester. The report should be roughly equivalent to a
conference paper in format, length, and style. Empirical results
should be included to evaluate the approach. Please place a copy of
your source code, your final report, and any other relevant data in a
directory under /projects/agents3/class/spr11/ by
class time on the last day of class. Please also bring a hard
copy (double-sided) of the report to class.
Related Courses Elsewhere
UTCS Reinforcement Learning Reading Group
The UTCS Reinforcement Learning Reading Group is a student run group that meets bi-weekly to discuss papers related to reinforcement learning. The RL Reading Group web page also provides a repository of past readings.
[Back to Department Homepage]
Page maintained by
Questions? Send me