CS394R: Reinforcement Learning: Theory and Practice -- Fall 2016
CS394R: Reinforcement Learning: Theory and Practice -- Fall 2016
Instructor: Peter Stone
Department of Computer Science
Tuesday, Thursday 9:30-11:00am
Jump to the assignments page.
Jump to the resources page.
Jump to the textbook page.
Please complete the midterm course evaluation survey.
office hours: TBA and by appointment
office: GDC 3.508
office hours: TBA and by appointment
office: GDC 3.424E
"The idea that we learn by interacting with our environment is probably
the first to occur to us when we think about the nature of
learning. When an infant plays, waves its arms, or looks about, it has
no explicit teacher, but it does have a direct sensori-motor
connection to its environment. Exercising this connection produces a
wealth of information about cause and effect, about the consequences
of actions, and about what to do in order to achieve goals. Throughout
our lives, such interactions are undoubtedly a major source of
knowledge about our environment and ourselves. Whether we are learning
to drive a car or to hold a conversation, we are all acutely aware of
how our environment responds to what we do, and we seek to influence
what happens through our behavior. Learning from interaction is a
foundational idea underlying nearly all theories of learning and
"Reinforcement learning problems involve learning what to do --- how to
map situations to actions --- so as to maximize a numerical reward
signal. In an essential way these are closed-loop problems because
the learning system's actions in uence its later inputs. Moreover,
the learner is not told which actions to take, as in many forms of
machine learning, but instead must discover which actions yield the
most reward by trying them out. In the most interesting and
challenging cases, actions may affect not only the immediate reward but
also the next situation and, through that, all subsequent rewards.
These three characteristics --- being closed-loop in an essential way,
not having direct instructions as to what actions to take, and where
the consequences of actions, including reward signals, play out over
extended time periods --- are the three most important distinguishing
features of the reinforcement learning problem."
These two paragraphs from chapter 1 of the course textbook describe
the topic of this course. The course is a graduate seminar. There
will be some assigned readings and discussions. The exact content of
the course will be guided in part by the interests of the students.
It will cover at least the first 9 chapters of the (2nd edition of
the) course textbook. Beyond that, we will either continue with the
text or move to more advanced and/or recent readings from the field
with an aim towards focussing on the practical successes and
challenges relating to reinforcement learning.
There will be a programming component to the course in the form of a
final project. Students will be expected to be proficient programmers.
Some background in artificial intelligence and strong programming
skills are recommended.
The course textbook is:
Reinforcement Learning: An Introduction.
Richard S. Sutton and
Andrew G. Barto.
MIT Press, Cambridge, MA, 1998.
Note that the book is available
As much as possible, we will be using the 2nd edition of the book,
which is available in draft from from that webpage.
Reading, written, and programming assignments will be updated on the
assignments page. A tentative schedule for the entire semester is posted. But the readings and exercises may change up until the Wednesday of the week before they are due (1 week in advance).
Slides from class and other relevant links and information are on the
resources page. If you find
something that should be added there, please email it to me.
While the Professor and the TA would be glad to answer any questions you have,
you would frequently find your peers to be an equally important resource in
Please subscribe to our class piazza page.
Grades will be based on:
- Written responses to the readings (10%):
By 5pm on the afternoon before a class with a new reading assignment
due, everyone must submit a
brief question or comment about the readings as an email in plain
ascii text. Please send it in the body of the email,
rather than as an attachment. Please use the subject line "class
readings for [due date]". In some cases,
specific questions may be posted along with the readings. But in
general, it is free form. Credit will be based on evidence that you
have done the readings carefully. Acceptable responses include (but
are not limited to):
Example successful responses from a previous class are available on the sample responses page.
These responses will be graded on a 10-point scale with a grade of 9 being a typical full-credit grade. Responses will be due by 5pm on Monday.
Responses received between then and 8:00a.m. on Tuesday
will be deducted 1 point (for a maximum score of 9). Responses
received between then and 8:00a.m. Thursday will be
deducted 2 points (for a maximum score of 8). Responses received
after that will be deducted 4 points (for a maximum score of 6).
These deadlines are designed both to encourage you to do the readings
before class and also to allow us to incorporate some of your
responses into the class discussions.
- Class participation (10%):
Students are expected to be present in class having completed the
readings and participate actively in the discussions.
- Oral presentation/discussion moderation (10%):
Each student will be expected to lead a discussion on one of the
readings. The discussion can begin with a brief summary/overview of
the important points in the readings, but the assumption is to be that
everyone has already completed the readings. The student may either
present material related to the readings (perhaps from an outside
source) or moderate a class discussion about the readings. In the
latter case, the student must be prepared to keep the conversation
flowing. Here are some tips on leading
a discussion. It is required that you present your plan for
the discussion, including any slides you intend to show, to the
Professor and TA at least two nights prior to your discussion
(Sunday or Tuesday night, depending on what day you're presenting).
Sign ups for discussion slots can be found
Sign up for one slot -- no day should get 2 people unless necessary.
- Preliminary programming exercises (4) (30%):
Each student will be required to complete four minor programming
assignments of his/her own choosing. In most cases these will come
from the exercises, though other options are possible upon
consultation with the instructor. These exercises need not involve
extensive or elaborate programs. The emphasis is to be on empirically
analyzing various learning algorithms and reporting on the results.
The reports should be emailed to the instructor and TA and all
relevant code and data should be submitted on canvas. Each student may choose when to complete
these exercises and on what topic. However at least three must be
completed during the first half of the semester, at least one of which will be presented in class. It is recommended
that the other be completed in conjunction with the student's oral
Grading criteria for programming assignments (out of 10):
7 and 7.5 - Adequate, but really didn't go beyond the minimal analysis Example
8 and 8.5 - Good job, but there is room for improvement Example
9 and 9.5 - Good analysis, results well presented Example
10 - Excellent, with interesting research issues identified. Doing more than what has been asked. Example
From the TA: My general rubric is based on 3 things (keep in mind this is only a rough guide of what I'm looking for):
(1) Clarity - How clearly was the experiment, hypothesis, and results motivated and explained?
This is usually the "first cut," and reports that weren't clear usually won't pass the 8.5 mark (and it drops from there depending how unclear). If I'm confused about what you did, then the rest of the report usually doesn't go well. While we're not evaluating your English, it does still make a difference.
(2) Insightfulness - How interesting was the question, and how much "digging" was involved?
For example, a lot of people did "parameter sweep" experiments using some algorithm. This is fine, but it doesn't really "transfer." Knowing that some setting of parameters worked in some grid world doesn't necessarily help you on a new problem. There was one report that looked at how the structure of the domain (e.g. connectedness, etc.) suggested parameter choices, and that justified bumping it up. As another example, sometimes, experiments would give strange results (for example, the most common one was the Gambler's problem). The degree to which they investigated the reason for those results also factored into the score. This is the second cut, and is what mostly separates the 9.5 and up from below.
(3) Effort - This is pretty subjective, as there usually isn't a "right" or "wrong" answer to the assignments. See the example reports above.
- Final programming project (40%):
A more extensive final programming project, along with written report,
will be due one week after the last day of class. Students will be expected to
agree with the instructor on the topic of the project by about halfway
through the semester. The report should be roughly equivalent to a
conference paper in format, length, and style. If it's an application-oriented project, empirical results
with statistical significance analysis should be included to evaluate the approach. Please upload a copy of
your source code, your final report, and any other relevant data
one week after the last day of class. Please also
send an email with a pdf version of the report attached to both the Professor and TA at that time.
UTCS Reinforcement Learning Reading Group
The UTCS Reinforcement Learning Reading Group is a student run
group that meets bi-weekly to discuss papers related to
reinforcement learning. The RL
Reading Group web page also provides a repository of past
- Here's An RL reading list from Shivaram Kalyanakrishnan.
- Csaba Szepesvari's list of RL applications
[Back to Department Homepage]
Page maintained by
Questions? Send me