CS394R: Reinforcement Learning: Theory and Practice -- Spring 2013

CS394R: Reinforcement Learning: Theory and Practice -- Spring 2013

Instructor: Peter Stone
Department of Computer Science

Wednesday 9am-noon
SZB 422

Jump to the assignments page.
Jump to the resources page.
Jump to the textbook page.
Jump to the on-line version of the textbook.

Please complete the midterm course evaluation survey.

Instructor Contact Information

office hours: Thursdays from 1pm-2pm (please let me know in advance if you're coming) or by appointment
office: GDC 3.508
phone: 471-9796
fax: 471-8885
email: pstone@cs.utexas.edu

Teaching Assistant

Sam Barrett
office hours: by appointment
office: GDC 3.424D
email: sbarrett@cs.utexas.edu

Course Description

"The idea that we learn by interacting with our environment is probably the first to occur to us when we think about the nature of learning. When an infant plays, waves its arms, or looks about, it has no explicit teacher, but it does have a direct sensori-motor connection to its environment. Exercising this connection produces a wealth of information about cause and effect, about the consequences of actions, and about what to do in order to achieve goals. Throughout our lives, such interactions are undoubtedly a major source of knowledge about our environment and ourselves. Whether we are learning to drive a car or to hold a conversation, we are all acutely aware of how our environment responds to what we do, and we seek to influence what happens through our behavior. Learning from interaction is a foundational idea underlying nearly all theories of learning and intelligence."

"Reinforcement learning is learning what to do---how to map situations to actions---so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward, but also the next situation and, through that, all subsequent rewards. These two characteristics---trial-and-error search and delayed reward---are the two most important distinguishing features of reinforcement learning."

These two paragraphs from chapter 1 of the course textbook describe the topic of this course. The course is an informal graduate seminar. There will be some assigned readings and discussions. The exact content of the course will be guided in part by the interests of the students. It will cover at least the first 9 chapters of the course textbook. Beyond that, we will either continue with the text or move to more advanced and/or recent readings from the field with an aim towards focussing on the practical successes and challenges relating to reinforcement learning.

There will be a programming component to the course in the form of a final project. Students will be expected to be proficient programmers.


Some background in artificial intelligence and strong programming skills are recommended.


The course textbook is:
Reinforcement Learning: An Introduction.
By Richard S. Sutton and Andrew G. Barto.
MIT Press, Cambridge, MA, 1998.
Note that the book is available on-line, though if you take the course, it's probably a book you'll want for your bookshelf. If so please buy it through the GRACS site.


Reading, written, and programming assignments will be updated on the assignments page. A tentative schedule for the entire semester is posted. But the readings and exercises may change up until the Wednesday of the week before they are due (1 week in advance).


Slides from class and other relevant links and information are on the resources page. If you find something that should be added there, please email it to me.

Discussion Forum

While the Professor and the TA would be glad to answer any questions you have, you would frequently find your peers to be an equally important resource in this class.

Please subscribe to our class piazza page.

Course Requirements

Grades will be based on:

Written responses to the readings (15%):
By 1pm on the afternoon before a class with a new reading assignment due, everyone must submit a brief question or comment about the readings as an email in plain ascii text. Please send it in the body of the email, rather than as an attachment. Please use the subject line "class readings for [due date]". In some cases, specific questions may be posted along with the readings. But in general, it is free form. Credit will be based on evidence that you have done the readings carefully. Acceptable responses include (but are not limited to):
  • Insightful questions;
  • Clarification questions about ambiguities;
  • Comments about the relation of the reading to previous readings;
  • Solutions to problems or exercises posed in the readings;
  • Critiques;
  • Thoughts on what you would like to learn about in more detail;
  • Possible extensions or related studies;
  • Thoughts on the paper's importance; and
  • Summaries of the most important things you learned.
  • Example successful responses from a previous class are available on the sample responses page.

    Class participation (15%):
    Students are expected to be present in class having completed the readings and participate actively in the discussions.

    Oral presentation/discussion moderation (10%):
    Each student will be expected to lead a discussion on one of the readings. The discussion can begin with a brief summary/overview of the important points in the readings, but the assumption is to be that everyone has already completed the readings. The student may either present material related to the readings (perhaps from an outside source) or moderate a class discussion about the readings. In the latter case, the student must be prepared to keep the conversation flowing. Here are some tips on leading a discussion. It is required that you present your plan for the discussion, including any slides you intend to show, to the Professor and TA at least three nights prior to your discussion (Sunday night). The discussions google doc lists the schedule for student presentations/discussion moderation.

    Preliminary programming exercises (4) (20%):
    Each student will be required to complete four minor programming assignments of his/her own choosing. In most cases these will come from the exercises, though other options are possible upon consultation with the instructor. These exercises need not involve extensive or elaborate programs. The emphasis is to be on empirically analyzing various learning algorithms and reporting on the results. The reports should be emailed to the instructor and TA and all relevant code and data should be submitted, preferably as instructed below. Each student may choose when to complete these exercises and on what topic. However at least three must be completed during the first half of the semester, at least one of which will be presented in class. It is recommended that the other be completed in conjunction with the student's oral presentation/discussion moderation. Upon completion, please submit using the turnin command on any CS machine. More complete directions are here.
    Grading criteria for programming assignments (out of 10):
    7 and 7.5 - Adequate, but really didn't go beyond the minimal analysis
    8 and 8.5 - Good job, but there is room for improvement
    9 and 9.5 - Good analysis, results well presented
    10 - Excellent, with interesting research issues identified. Doing more than what has been asked.

    Final programming project (40%):
    A more extensive final programming project, along with written report, will be due one week after the last day of class. Students will be expected to agree with the instructor on the topic of the project by about halfway through the semester. The report should be roughly equivalent to a conference paper in format, length, and style. Empirical results should be included to evaluate the approach. Please place a copy of your source code, your final report, and any other relevant data in a directory under /projects/agents3/class/spr13/ by one week after the last day of class. Please also put a hard copy (double-sided) of the report under the Professor's office door.

    Related Courses

    UTCS Reinforcement Learning Reading Group

    [Back to Department Homepage]

    Page maintained by Peter Stone
    Questions? Send me mail