UTCS Reinforcement Learning Reading Group

Jump to next meeting Jump to resources Jump to previous readings

The UTCS Reinforcement Learning Reading Group is a student-run group that discusses research papers related to reinforcement learning. Ever since its first meeting in the spring of 2004, the group has served as a forum for students to discuss interesting research ideas in an informal setting. Meetings are usually held in the afternoon and refreshments are provided. Occasionally, the group hosts invited talks.

The group is currently coordinated by Ishan Durugkar and Sai Kiran Narayanaswami. The previous (glorious) coordinators are:

Elad Liebman (Fall 2012 - Spring 2019)
Matthew Hausknecht (Fall 2011 - Fall 2012)
Shivaram Kalyanakrishnan (Spring 2006 - Spring 2011)
Matt Taylor (Spring 2004 - Fall 2005)

This page provides information about group meetings. Also, it lists useful resources for reinforcement learning, and serves as a repository of all past readings.

New members are always welcome! Interested students or researchers may also subscribe to the group e-mailing list.

Communication

The reading group has an e-mailing list (rlreadinggroup@utlists.utexas.edu) on which regular announcements are made.

To subscribe to the list or to unsubscribe from it, send your request through e-mail to ishand@cs.utexas.edu or nskiran@cs.utexas.edu .
To send e-mail to the list, address rlreadinggroup@utlists.utexas.edu.

Meeting Time and Place

The group will meet at 4 p.m. on *Mondays* by coordination. Due to COVID-19 we are having hybrid meetings, on Zoom and at GDC 3.516. Meeting time and place may change on occasion.

Next Meeting

Monday February 28, 2022

Time: 4 pm

Place: Zoom and GDC 3.516

Deep Reinforcement Learning at the Edge of the Statistical Precipice
Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare, NeurIPS 2021
Discussion Leader: Elad Liebman

Reinforcement Learning Resources

(jump back to to top)

Textbook by Richard Sutton and Andrew Barto Reinforcement Learning: An Introduction.
Peter Stone's Spring 2019 course at the University of Texas: Reinforcement Learning: Theory and Practice.
Rich Sutton's Winter 2010 course at the University of Alberta: Reinforcement Learning in Artificial Intelligence.
Michael Littman's Fall 2005 course at Rutgers University: Learning and Sequential Decision Making.
Yishay Mansour's 2012 workshop on games: RL and Games workshop. Previous runs and other relevant courses can be found on Yishay Mansour's webpage.
Shivaram Kalyanakrishnan's RL reading list.

Paper Readings and Talks (Reverse Chronological Order) [Out of Date]

(jump back to to top)

Summer 2020

"Feature Expansive Reward Learning: Rethinking Human Input"
Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan, ArXiv PrePrint
Discussion Leader: Reuth Mirsky and William Macke

"Revisiting Fundamentals of Experience Replay"
William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney, ICML 2020
Discussion Leader: Yunshu Du

"Deep Residual Reinforcement Learning"
Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson, AAMAS 2020
Discussion Leader: Bo Liu

"A Closer Look at Deep Policy Gradients"
Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry, ICLR 2020
"Is the Policy Gradient a Gradient?"
Chris Nota, Philip S. Thomas, AAMAS 2020
Discussion Leader: Ishan Durugkar

Spring 2020

"State-only Imitation with Transition Dynamics Mismatch"
Tanmay Gangwani, Jian Peng, ICLR 2020
Discussion Leader: Haresh Karnan

"VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning"
Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson, ICLR 2020
Discussion Leader: Sai Kiran Narayanaswami

"Program Guided Agents"
Shao-Hua Sun, Te-Lin Wu, Joseph J. Lim, ICLR 2020
Discussion Leader: Kai-Chi Huang

"CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning"
Jiachen Yang, Alireza Nakhaei, David Isele, Kikuo Fujimura, Hongyuan Zha, ICLR 2020
Discussion Leader: Elad Liebman

"Rudder: Return decomposition for delayed rewards"
Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter, NeurIPS 2019
Discussion Leader: Ishan Durugkar

"Training Agents using Upside-Down Reinforcement Learning"
Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaskowski, Jurgen Schmidhuber, ArXiv 2019
Discussion Leader: Yunshu Du

"Deep Reinforcement Learning for General Game Playing"
Adrian Goldwaser and Michael Thielscher, AAAI 2020
Discussion Leader: William Macke

"Causal Confusion in Imitation Learning"
Pim de Haan, Dinesh Jayaraman and Sergey Levine, NeurIPS 2019
Discussion Leader: Mauricio B Garcia Tec

"A Divergence Minimization Perspective on Imitation Learning Methods"
Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shixiang Gu, CoRL 2019
Discussion Leader: Bo Liu

Fall 2019

"Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents"
Felipe Leno Da Silva, Pablo Hernandez-Leal , Bilal Kartal , Matthew E. Taylor, AAAI 2020
Discussion Leader: Yunshu Du

"Compositional Plan Vectors for Multi-Task Control"
Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine, NeurIPS 2019
Discussion Leader: Yifeng Zhu

"Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement"
Chao Yang, Xiaojian Ma, Wenbing Huang, Fuchun Sun, Huaping Liu, Junzhou Huang, Chuang Gan, NeurIPS 2019
Discussion Leader: Faraz Torabi

"Search on the Replay Buffer: Bridging Planning and Reinforcement Learning"
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine, NeurIPS 2019
Discussion Leader: Kai-Chi Huang

"TACO: Learning Task Decomposition via Temporal Alignment for Control"
Kyriacos Shiarlis, Markus Wulfmeier, Sasha Salter, Shimon Whiteson, Ingmar Posner, ICML 2019
Discussion Leader: Farzan Memarian

"When to use parametric models in reinforcement learning?"
Hado van Hasselt, Matteo Hessel, John Aslanides, NeurIPS 2019
Discussion Leader: Haresh Karnan

"Universal Successor Features Approximators"
Diana Borsa, Andre Barreto, John Quan, Daniel J. Mankowitz, Hado van Hasselt, Remi Munos, David Silver, Tom Schaul, ICLR 2019
Discussion Leader: Daniel Brown

"Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL"
Anusha Nagabandi, Chelsea Finn, Sergey Levine, ICLR 2019
Discussion Leader: Sai Kiran Narayanaswami

"Hyperbolic Discounting and Learning over Multiple Horizons"
William Fedus, Carles Gelada, Yoshua Bengio, Marc G. Bellemare, Hugo Larochelle, arXiv 2019
Discussion Leader: Ishan Durugkar

Spring 2019

"Robust Adversarial Reinforcement learning"
Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta, ICML 2017
Discussion Leader: Haresh Karnan
"Learning to Teach in Cooperative Multiagent Reinforcement Learning"
Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, Jonathan P. How, AAAI 2019
Discussion Leader: Felipe "Leno" da Silva
"NerveNet: Learning Structured Policy with Graph Neural Networks"
Tingwu Wang, Renjie Liao, Jimmy Ba, Sanja Fidler, ICLR 2018
Discussion Leader: Eddy Hudson
"NerveNet: Learning Structured Policy with Graph Neural Networks"
Tingwu Wang, Renjie Liao, Jimmy Ba, Sanja Fidler, ICLR 2018
Discussion Leader: Eddy Hudson
"Model-Based Reinforcement Learning via Meta-Policy OptimizationModel-Based Reinforcement Learning via Meta-Policy Optimization"
Ignasi Clavera, Jonas Rothfuss, John Schulman, Yasuhiro Fujita, Tamim Asfour, Pieter Abbeel, CoRL 2018
Discussion Leader: Harsh Goyal
"Contingency-Aware Exploration in Reinforcement Learning"
Jongwook Choi, Yijie Guo, Marcin Moczulski, Junhyuk Oh, Neal Wu, Mohammad Norouzi, Honglak Lee, ICLR 2019
Discussion Leader: Santhosh Ramakrishnan

Fall 2018

"Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine, ICML 2018
Discussion Leader: Rishi Shah
"Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation"
Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill, NIPS 2017
Discussion Leader: Josiah Hanna
"SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation"
Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, Le Song, ICML 2018
Discussion Leader: Ishan Durugkar

"Successor Features for Transfer in Reinforcement Learning"

Andre Barreto, Will Dabney, Remi Munos, Jonathan J. hunt, Tom Schaul, Hado van Hasselt, David Silver, NIPS 2017

Sanmit Narvekar

Spring 2018

"Vector-based navigation using grid-like representations in artificial agents"
Andrea Banino, Caswell Barry, Benigno Uria, Charles Blundell, Timothy Lillicrap, And a small army of other DeepMind and UCL folk.
Discussion Leader: Ruohan "neuron power" Zhang
"Learning by Playing – Solving Sparse Reward Tasks from Scratch"
Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, Tobias Springenberg.
Discussion Leader: Sanmit "ninja" Narvekar
"Prioritized Experience Replay"
Max Tom Schaul, John Quan, Ioannis Antonoglou and David Silver, ICLR 2016.
Discussion Leader: Aishwarya Padmakumar
"Generalizing Skills with Semi Supervised Reinforcement Learning"
Max Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel and Sergey Levine, ICLR 2017.
Discussion Leader: Prabhat Nagarajan
"Reinforcement Learning with Unsupervised Auxiliary Tasks"
Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver & Koray Kavukcuoglu, ICLR 2017.
Discussion Leader: Josiah "Bada**" Hannah
"Action-Dependent Control Variates for Policy Optimization Via Stein's Identity"
Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu, ICLR 2018 .
Discussion Leader: Yihao Feng
"Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning"
Yevgen Chebotar, Karol Hausman, Marvin Zhang, Gaurav Sukhatme, Stefan Schaal, Sergey Levine, ICML 2017 .
Discussion Leader: Sid Desai
"Time-Contrastive Networks: Self-Supervised Learning from Video"
Pierre Sermanet, Corey Lynch, Jasmine Hsu, Sergey Levine, ARXIV, 2017 .
Discussion Leader: Faraz Torabi

Fall 2017

"Fairness in Reinforcement Learning"
Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Aaron Roth, ICML 2017 .
Discussion Leader: Elad Liebman
"Mastering the game of Go without human knowledge"
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis, Nature 2017.
Discussion Leader: Sanmit Narvekar
"A Distributional Perspective on Reinforcement Learning"
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel and Stuart Russell, NIPS 2016.
Discussion Leader: Ishan Durugkar
"Bridging the Gap Between Value and Policy Based Reinforcement Learning"
Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans, NIPS 2017.
Discussion Leader: Josiah Hannah
"Cooperative Inverse Reinforcement Learning"
Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel and Stuart Russell, NIPS 2016.
Discussion Leader: Elad Liebman

Summer 2017

"Expected Policy Gradients"
Kamil Ciosek and Shimon Whiteson, Arxiv preprint.
Discussion Leader: Ishan Durugkar
"Deep reinforcement learning from human preferences"
Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei, Arxiv preprint.
Discussion Leader: Garrett Warnell
"Learning from Demonstrations for Real World Reinforcement Learning"
Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys, Arxiv preprint.
Discussion Leader: Elad Liebman
"Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning"
Abhishek Gupta, Coline Devin, YuXuan Liu, Pieter Abbeel, and Sergey Levine, ICLR 2017.
Discussion Leader: Josiah Hannah

Spring 2017

"Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning"
Abhishek Gupta, Coline Devin, YuXuan Liu, Pieter Abbeel, and Sergey Levine, ICLR 2017.
Discussion Leader: Josiah Hannah
"Opponent Modeling in Deep Reinforcement Learning"
He He, Jordan Boyd-Graber, Kevin Kwok and Hal Daumé III, ICML 2016.
Discussion Leader: Stefano Albrecht
"The Option-Critic Architecture"
Pierre-Luc Bacon, Jean Harb and Doina Precup, AAAI 2017.
Discussion Leader: Sanmit Narvekar
"Value Iteration Networks"
Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, and Pieter Abbeel, NIPS 2016.
Discussion Leader: Elad Liebman

Fall 2016

"Accelerated Gradient Temporal Difference Learning"
Yangchen Pan, Adam White and Martha White, AAAI 2017.
Discussion Leader: Martha White herself (!)

"True Online TD(\lambda)"

Seijen and Sutton, ICML 2014.

Sanmit Narvekar

journal version

"Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs"
Finale Doshi, Joelle Pineau and Nicholas Roy, ICML 2008.
Discussion Leader: Elad Liebman
"Model-Based Relative Entropy Stochastic Search"
Rudolf Lioutikov , Nuno Lau, Luis Paulo Reis, Jan Peters, and Gerhard Neumann, NIPS 2015.
Discussion Leader: Patrick MacAlpine
"High-Dimensional Continuous Control Using Generalized Advantage Estimation",
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel, ICLR 2016. Also touched on "Trust Region Policy Optimization",
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel, ICML 2015.
Discussion Leader(s): Josiah Hanna and Matthew Hausknecht

Spring 2016

"Risk Aversion in Markov Decision Processes via Near-Optimal Chernoff Bounds",
Moldovan and Abbeel, NIPS 2013,
Discussion Leader: Michael Albert
"Intrinsically Motivated Hierarchical Skill Learning in Structured Environments"
Vigorito and Barto, IEEE Transactions on Autonomous Mental Development, 2010.
Discussion Leader: Jake Menashe
"Reward Mapping for Transfer in Long-Lived Agents"
Xiaoxiao Guo, Satinder Singh and Richard Lewis, NIPS 2013.
Discussion Leader: Jivko Sinapov
"Offline Evaluation of Online Reinforcement Learning Algorithms"
Travis Mandel , Yun-En Liu, Emma Brunskill and Zoran Popovic, AAAI 2016.
Discussion Leader: Sanmit Narvekar
"Bandits with Unobserved Confounders: A Causal Approach"
Bareinboim, Forney and Pearl, NIPS 2015
Discussion Leader: Dr. Daniel Urieli, the Mensch
"TD lambda: Reevaluating Complex Backups in Temporal Difference Learning"
G.D. Konidaris, S. Niekum, and P.S. Thomas, NIPS 2011
Discussion Leader: Elad Liebman

Fall 2015

"Learning Partial Policies to Speedup MDP Tree Search"
Jervis Pinto and Alan Fern, UAI 2014
Discussion Leader: Elad Liebman
"Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning"
Xiaoxiao Guo, Satinder Singh,Honglak Lee, Richard Lewis, Xiaoshi Wang, NIPS 2015
Discussion Leader: Matthew Hausknecht
"The Dependence of Effective Planning Horizon on Model Accuracy"
Nan Jiang, Alex Kulesza, Satinder Singh, and Richard Lewis, AAMAS 2015
Discussion Leader: Daniel Urieli

Spring 2015

"Deterministic Policy Gradient Algorithms"
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra and Martin Riedmiller, ICML 2014
Discussion Leader: Matthew Hausknecht
"Policy Gradient Methods for Reinforcement Learning with Function Approximation"
Sutton, McAllester,Singh and Mansour, NIPS 1999
Discussion Leader: Matthew Hausknecht
"Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search"
Guez, Silver and Dayan, NIPS 2012
(The journal version is highly recommended... [JAIR 2013])
Discussion Leader: Elad Liebman
"Human-level control through deep reinforcement learning"
Mnih et al. (AKA "The DeepMind folks"), Nature 518, February 2015
Discussion Leader: Matthew Hausknecht
"Compress and Control."
Veness et al., AAAI 2015
Discussion Leader: Daniel Urieli
"Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment."
Bou Ammar et al., AAAI 2015
Discussion Leader: Jivko Sinapov

Fall 2014

"PAC-inspired Option Discovery in Lifelong Reinforcement Learning."
Brunskill & Li, ICML 2014
Discussion Leader: Jake Menashe
"Sample Efficient Reinforcement Learning with Gaussian Processes."
Grande et al., ICML 2014
Discussion Leader: Elad Liebman
"Transfer of samples in batch reinforcement learning."
Restelli & Bonarini, ICML 2008
Discussion Leader: Jivko Sinapov
"Reinforcement Learning with Multi-Fidelity Simulators."
Cutler, Mark, Thomas J. Walsh, and Jonathan P. How, ICRA 2014,
Discussion Leader: Matteo Leonetti
"Feature Construction for Inverse Reinforcement Learning"
Levine, Popovic and Koltun, NIPS 2010,
Discussion Leader: Elad Liebman

Spring 2014

"Bayesian Policy Search with Policy Priors"
Goodman, Daniel M. Roy, Leslie P. Kaelbling and Joshua B. Tenenbaum
IJCAI 2011
Discussion Leader: Wesley Tansey
"On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup"
Martin Riedmiller and Thomas Gabel
CIG 2007
Discussion Leader: Jason Liang
"Coordinated Reinforcement Learning"
Guestrin, Lagoudakis and Parr
ICML 2002
Discussion Leader: Sanmit Narvekar
"Bayesian Multi-Task Reinforcement Learning"
Lazaric, A. and Ghavamzadeh
ICML 2010
Discussion Leader: Wesley Tansey
"Policy Shaping: Integrating Human Feedback with Reinforcement Learning"
Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, and Andrea Thomaz
NIPS 2013
Discussion Leader: Elad Liebman

Fall 2013

"Bellman Error Based Feature Generation using Random Projections on Sparse Spaces"
Mahdi Milani Fard, Yuri Grinberg, Amir Massoud Farahmand, Joelle Pineau, Doina Precup
NIPS 2013
Discussion Leader: Craig Corcoran
"Multi-task Reinforcement Learning in Partially Observable Stochastic Environments"
Li, Liao and Carin
JMLR 2009
Discussion Leader: Elad Liebman
"Multi-Task Reinforcement Learning: a Hierarchical Bayesian Approach"
Wilson, Fern, Ray and Tadepalli
ICML 2007
Discussion Leader: Wesley Tansey
"Reinforcement Learning in Robotics: a Survey"
Kober, Bagnell and Peters
IJRR 2013
Discussion Leader: Daniel Urieli
Reinforcement Learning for Robot Soccer
Martin Riedmiller, Thomas Gabel, Roland Hafner, and Sasch Lange
Auton Robot (2009), 27:55-73, DOI 10.1007/s10514-009-9120-4
Discussion Leader: Sanmit Narvekar
Model Learning in Robotics: a Survey
Nguyen-Tuong, D.; Peters, J.
Cognitive Processing, 12(4), pp.319 to 340, 2011
Discussion Leader: Samuel Rubin "Models" Barrett
Dynamic Preferences in Multi-Criteria Reinforcement Learning
Sriram Natarajan and Prassad Tadepalli, ICML 2005
Discussion Leader: Elad Liebman

Summer 2013

Transfer Learning by Discovering Latent Task Parametrizations
Finale Doshi-Velez and George Konidaris, NIPS 2012
Discussion Leader: Wesley Tansey
Preference Elicitation and Inverse Reinforcement Learning
Constantin A. Rothkopf and Christos Dimitrakakis, ECML 2011
Discussion Leader: Elad Liebman

Spring 2013

Regret minimization in games with incomplete information
Zinkevich, M., Johanson, M., Bowling, M., & Piccione, C.
Advances in neural information processing systems (NIPS'08), 20, 1729-1736, 2008
Discussion Leader: Wesley Tansey
Dynamic Role Assignment using General Value Functions
Saminda Abeyruwan, Andreas Seekircher and Ubbo Visser
International Conference on Humanoid Robots, 2012
Discussion Leader: Samuel Barrett
"Probabilistic Inference for Solving Discrete and Continuous State Markov Decision Processes"
Marc Toussaint, Amos Storkey, ICML 2006
Discussion Leader: Craig Corcoran
(For the extremely interested reader, Craig also suggests taking a gander at these:
Probabilistic inference for solving (PO)MDPs
Expectation-Maximization methods for solving (PO)MDPs and optimal control problems)

Fall 2012

The Self Organization of Context for Learning in Multiagent Games
Chris White and David Brogan
Discussion led by Patrick McAlpine, November 12, 2012.
Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective
Satinder Singh, Richard Lewis, Andrew Barto, and Jonathan Sorg
Discussion led by Matthew Hausknecht, October 29, 2012.
Natural Actor-Critic
Jan Peters and Sefan Schaal
Discussion led by Daniel Urieli, October 10, 2012.
Policy Gradient Planning for Environmental Decision Making with Existing Simulators
Mark Crowley and David Poole
Discussion led by Todd Hester, September 24, 2012.
Multi-timescale Nexting in a Reinforcement Learning Robot
Joseph Modayil, Adam White, and Richard Sutton
Discussion led by Sam Barrett, September 10, 2012.

Spring 2012

Hierarchical POMDP Controller Optimization by Likelihood Maximization
Marc Toussaint, Laurent Charlin, and Pascal Poupart
Exploiting Best-Match Equations for Efficient Reinforcement Learning
Harm van Seijen, Shimon Whiteson, Hado van Hasselt, Marco Wiering
Discussion led by Matthew Hausknecht, March 26, 2012.
A Monte-Carlo AIXI Approximation
Joel Veness, Kee Siong Ng, Marcus Hutter, William Uther, David Silver
Discussion led by Todd Hester, March 5, 2012.
Learned Behaviors of Multiple Autonomous Agents in Smart Grid Markets
Prashant P. Reddy and Manuela Veloso
Discussion led by Daniel Urieli, Feburary 20, 2012.
Conjugate Markov Decision Processes
Philip S. Thomas and Andrew G. Barto
Discussion led by Sam Barrett, Feburary 6, 2012.

Fall 2011

Model-Free Least Squares Policy Iteration
M. Lagoudakis and R. Parr
Discussion led by Brad Knox, November 7, 2011.
An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning
Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, Michael L. Littman
Discussion led by Craig Corcoran, October 24, 2011.
Optimizing Debt Collections Using Constrained Reinforcement Learning
Naoki Abe, Vince P. Thomas, Melissa Kowalczyk, et. al
Discussion led by Matthew Hausknecht, October 10, 2011.
Technical update: Least-squares temporal difference learning
Justin A. Boyan
Discussion led by Brad Knox, September 26, 2011.
Online Discovery of Feature Dependencies
Alborz Geramifard, Finale Doshi, Joshua Redding, Nicholas Roy, Jonathan P. How, 2011
Discussion led by Doran Chakraborty, September 12, 2011.

Spring 2011

The Nonstochastic Multiarmed Bandit Problem
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund and Robert E. Schapire, 2002
Discussion led by Doran Chakraborty, April 18, 2011.
A Contextual-Bandit Approach to Personalized News Article Recommendation
Lihong Li, Wei Chu, John Langford and Robert E. Schapire, 2010
Discussion led by Doran Chakraborty, March 28, 2011.
Reinforcement Learning via Practice and Critique Advice
Kshitij Judah, Saikat Roy, Alan Fern and Thomas G. Dietterich, 2010
Discussion led by Brad Knox, March 7, 2011.
Reinforcement Learning of Motor Skills in High Dimensions: A Path Integral Approach
Evangelos Theodorou, Jonas Buchli and Stefan Schaal, 2010
Discussion led by Daniel Urieli, February 14, 2011.
Success, strategy and skill: an experimental study
Christopher Archibald, Alon Altman and Yoav Shoham, 2010
Modeling Billiards Games
Christopher Archibald and Yoav Shoham, 2009
Discussion led by Shivaram Kalyanakrishnan, January 31, 2011.

Fall 2010

A Perspective View and Survey of Meta-Learning
Ricardo Vilalta and Youssef Drissi, 2002
Discussion led by Shivaram Kalyanakrishnan, November 29, 2010.
Maximum Entropy Inverse Reinforcement Learning
Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell and Anind K. Dey, 2008
Discussion led by Rahul Suri, November 8, 2010.
Basis Function Construction For Hierarchical Reinforcement Learning
Sarah Osentoski and Sridhar Mahadevan, 2010
Discussion led by Brad Knox, November 1, 2010.
Interactive Learning of Mappings from Visual Percepts to Actions
Sébastien Jodogne and Justus H. Piater, 2005
Discussion led by Juhyun Lee, October 18, 2010.
Autonomous Blimp Control using Model-free Reinforcement Learning in a Continuous State and Action Space
Axel Rottman, Christian Plagemann, Peter Hilgers and Wolfram Burgard, 2010
Discussion led by Michael Quinlan, September 27, 2010.
Multiagent Reinforcement Learning for Urban Traffic Control using Coordination Graphs
Lior Kuyer, Shimon Whiteson, Bram Bakker and Nikos Vlassis, 2008
Discussion led by Chiu Au, September 13, 2010.
Integrating Sample-based Planning and Model-based Reinforcement Learning
Thomas J. Walsh, Sergiu Goschin and Michael L. Littman, 2010
Discussion led by Todd Hester, August 30, 2010.

Summer 2010

Nonparametric Return Distribution Approximation for Reinforcement Learning
Tetsuro Morimura, Masashi Sugiyama, Hisashi Kashima, Hirotaka Hachiya and Toshiyuki Tanaka, 2010
Discussion led by Daniel Urieli, August 9, 2010.
ε-First Policies for Budget-Limited Multi-Armed Bandits
Long Tran-Thanh, Archie Chapman, Enrique Munoz de Cote, Alex Rogers and Nicholas R. Jennings, 2010
Discussion led by Shivaram Kalyanakrishnan, August 2, 2010.

Spring 2010

Variable Resolution Discretization in Optimal Control
Rémi Munos and Andrew Moore, 2002
Discussion led by Tobias Jung, May 10, 2010.
Basis function adaptation in temporal difference reinforcement learning
Ishai Menache, Shie Mannor and Nahum Shimkin, 2005
Discussion led by Shivaram Kalyanakrishnan, May 3, 2010.
Bayesian Inverse Reinforcement Learning
Deepak Ramachandran and Eyal Amir, 2007
Discussion led by Adam Setapen, April 19, 2010.
A Comprehensive Survey of Multiagent Reinforcement Learning
Lucian Buşoniu, Robert Babuška and Bart De Schutter, 2008
Discussion led by Doran Chakraborty, April 5, 2010.
Where Do Rewards come From?
Satinder Singh, Richard L. Lewis and Andrew G. Barto, 2010
Discussion led by Todd Hester, March 29, 2009.
Learning All Optimal Policies with Multiple Criteria
Leon Barrett and Srini Narayanan, 2008
Discussion led by Sam Barrett, March 8, 2010.
Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
George Konidaris and Andrew Barto, 2009
Discussion led by Brad Knox, February 15, 2010.

Fall 2009

Efficient Reinforcement Learning for Motor Control
Marc Peter Deisenroth and Carl Edward Rasmussen, 2009
Discussion led by Todd Hester, November 30, 2009.
Binary Action Search for Learning Continuous-Action Control Policies
Jason Pazis and Michail G. Lagoudakis, 2009
Discussion led by Shivaram Kalyanakrishnan, November 16, 2009.
Ph.D. Oral Proposal: Practice Talk
Shivaram Kalyanakrishnan, November 2, 2009.
The Adaptive k-Meteorologists Problem and Its Application to Structure Learning and Feature Selection in Reinforcement Learning
Carlos Diuk, Lihong Li and Bethany R. Leffler, 2009
Discussion led by Doran Chakraborty, October 19, 2009.
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
Engin İpek, Onur Mutlu, José F. Martínez and Rich Caruana, 2008
Discussion led by Matthew Hausknecht, September 28, 2009.
Decision theory, reinforcement learning, and the brain
Peter Dayan and Nathaniel D. Daw, 2008
Discussion led by Igor Karpov, September 14, 2009.

Summer 2009

No meetings.

Spring 2009

Experiments in Animal Behavior
Presentation by Brad Knox, April 29, 2009.
Using Reinforcement Learning to Adapt an Imitation Task
Florent Guenter and Aude G. Billard, 2007
Discussion led by Brad Knox, April 17, 2009.
Multi-resolution Exploration in Continuous Spaces
Ali Nouri and Michael L. Littman, 2008
Discussion led by Todd Hester, April 8, 2009.
Evolving Neural Networks for Strategic Decision-Making Problems
Nate Kohl and Risto Miikkulainen, 2009
Discussion led by Nate Kohl, April 3, 2009.
Gaussian Process Dynamic Programming
Marc Peter Deisenroth, Carl Edward Rasmussen and Jan Peters, 2009
Discussion led by Tobias Jung, March 11, 2009.
An Empirical Analysis of Value Function-Based and Policy Search Reinforcement Learning
Shivaram Kalyanakrishnan and Peter Stone, 2009
Discussion led by Shivaram Kalyanakrishnan, March 6, 2009.

Fall 2008

Agent Learning using Action-Dependent Learning Rates in Computer Role-Playing Games
Maria Cutumisu, Duane Szafron, Michael Bowling and Richard S. Sutton, 2008
Discussion led by Jacob Schrum, December 10, 2008.
An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning
Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield and Michael L. Littman, 2008
Discussion led by Nick Jong, November 26, 2008.
An Analysis of Reinforcement Learning with Function Approximation
Francisco S. Melo, Sean P. Meyn and M. Isabel Ribeiro, 2008
Discussion led by Shivaram Kalyanakrishnan, November 12, 2008.
Strategy Evaluation in Extensive Games with Importance Sampling
Michael Bowling, Michael Johanson, Neil Burch and Duane Szafron, 2008
Discussion led by Doran Chakraborty, October 29, 2008.
HiPPo: Hierarchical POMDPs for Planning Information Processing and Sensing Actions on a Robot
Mohan Sridharan, Jeremy Wyatt and Richard Dearden, 2008
Presentation by Mohan Sridharan, October 22, 2008.
Knows What It Knows: A Framework For Self-Aware Learning
Lihong Li, Michael L. Littman and Thomas J. Walsh, 2008
Discussion led by Todd Hester, October 1, 2008.
Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs
Finale Doshi, Joelle Pineau and Nicholas Roy, 2008
Discussion led by Nick Jong, September 11, 2008.

Summer 2008

Planning and Learning in Environments with Delayed Feedback
Thomas J. Walsh, Ali Nouri, Lihong Li and Michael L. Littman, 2007
Discussion led by Shivaram Kalyanakrishnan, June 20, 2008.
Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning
Arthur Guez, Robert D. Vincent, Massimo Avoli and Joelle Pineau, 2008
Discussion led by Matt Taylor, May 30, 2008.

Spring 2008

Perspectives on Reinforcement Learning: Group Discussion
Discussion led by Michael Quinlan, April 25, 2008.
Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis
Claudia V. Goldman and Shlomo Zilberstein, 2004
Discussion led by Doran Chakraborty, April 11, 2008.
Planning with Durative Actions in Stochastic Domains
Mausam and Daniel S. Weld, 2007
Discussion led by Doran Chakraborty, April 4, 2008.
Fourteen Declarative Principles for an Integrative Science of the Temporal Dynamics of Learning
Richard S. Sutton, 2008
Discussion led by Todd Hester, March 21, 2008.
Factor-Guided Motion Planning for a Robot Arm
Jaesik Choi and Eyal Amir, 2007
Discussion led by Michael Quinlan, February 29, 2008.
Invited Speaker: Eyal Amir.
Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man
István Szita and András Lőrincz, 2007
Discussion led by Matt Taylor, February 15, 2008.
A Natural Policy Gradient
Sham Kakade, 2002
Discussion led by Andrew Dreher, January 25, 2008.

Fall 2007

ICML-07 Tutorial on Bayesian Methods for Reinforcement Learning, Part 4
Pascal Poupart, Mohammad Ghavamzadeh and Yaakov Engel, 2007
Reinforcement learning with Gaussian processes
Yaakov Engel, Shie Mannor and Ron Meir, 2005
Discussion led by Joe Reisinger, December 7, 2007.
ICML-07 Tutorial on Bayesian Methods for Reinforcement Learning, Parts 1, 2, 3
Pascal Poupart, Mohammad Ghavamzadeh and Yaakov Engel, 2007
Discussion led by Nick Jong, November 30, 2007.
Constructing Basis Functions from Directed Graphs for Value Function Approximation
Jeff Johns and Sridhar Mahadevan, 2007
Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
Sridhar Mahadevan and Mauro Maggioni, 2007
Discussion led by Matt Taylor, November 16, 2007.
The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems
Caroline Claus and Craig Boutilier, 1998
Discussion led by Doran Chakraborty, November 9, 2007.
Linearly-solvable Markov Decision Problems
Emanuel Todorov, 2006
Discussion led by Ian Fasel and Shivaram Kalyanakrishnan, October 12, 2007.
Combining Online and Offline Knowledge in UCT
Sylvain Gelly and David Silver, 2007
Discussion led by Joe Reisinger, September 21, 2007.
Dynamic Positioning in 3D RoboCup Soccer
Presentation by Sahar Asadi, September 7, 2007.

Summer 2007

Dirichlet Process Mixtures
Khalid El-Arini, 2005
A Bayesian Framework for Reinforcement Learning
Malcolm Strens, 2000
Discussion led by Nick Jong, June 29, 2007.
Multi-Task Reinforcement Learning: A Hierarchical Bayesian Approach
Aaron Wilson, Alan Fern, Soumya Ray and Prasad Tadepalli, 2007
Discussion led by Todd Hester, June 15, 2007.
Nash Q-Learning for General-Sum Stochastic Games
Junling Hu and Michael P. Wellman, 2003
Discussion led by Shimon Whiteson, June 8, 2007.
Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games
Colin McMillen and Manuela Veloso, 2007
Discussion led by Shivaram Kalyanakrishnan, June 1, 2007.
Efficient Reinforcement Learning with Relocatable Action Models
Bethany R. Leffler, Michael L. Littman and Timothy Edmunds, 2007
Discussion led by Matt Taylor, May 25, 2007.

Spring 2007

Adaptive Representations for Reinforcement Learning
Shimon Azariah Whiteson, 2007
Ph.D. defense by Shimon Whiteson, April 20, 2007.
Integrating Guidance into Relational Reinforcement Learning
Kurt Driessens and Sašo Džeroski, 2004
Discussion led by Andrew Dreher, April 6, 2007.
Online Learning and Exploiting Relational Models in Reinforcement Learning
Tom Croonenborghs, Jan Ramon, Hendrick Blockeel and Maurice Bruynooghe, 2007
Discussion led by Matt Taylor, March 23, 2007.
State Similarity Based Approach for Improving Performance in RL
Sertan Girgin, Faruk Polat and Reda Alhajj, 2007
Discussion led by Todd Hester, March 9, 2007.
Deictic Option Schemas
Balaraman Ravindran, Andrew G. Barto and Vimal Mathew, 2007
Discussion led by Rahul Iyer, March 2, 2007.
Bayesian Q-learning
Richard Dearden, Nir Friedman and Stuart Russell, 1998
Discussion led by David Pardoe, February 9, 2007.
An Intrinsic Reward Mechanism for Efficient Exploration
Özgür Şimşek and Andrew G. Barto, 2006
Discussion led by Shivaram Kalyanakrishnan, January 26, 2007.

Fall 2006

Decision Tree Methods for Finding Reusable MDP Homomorphisms
Alicia Peregrin Wolfe and Andrew G. Barto, 2006
Discussion led by Rahul Iyer, December 4, 2006.
Robot planning in partially observable continuous domains
Josep M. Porta, Matthijs T. J. Spaan and Nikos Vlassis, 2005
Discussion led by Igor Karpov, November 20, 2006.
Reinforcement Learning in POMDP's via Direct Gradient Ascent
Jonathan Baxter and Peter L. Bartlett, 2000
Discussion led by Yaxin Liu, November 6, 2006.
Reinforcement Learning for Optimized Trade Execution
Yuriy Nevmyvaka, Yi Feng and Michael Kearns, 2006
Discussion led by Andrew Dreher, October 23, 2006.
Looping Suffix Tree-Based Inference of Partially Observable Hidden State
Michael P. Holmes and Charles Lee Isbell, Jr., 2006
Discussion led by Nick Jong, October 9, 2006.
Sparse Cooperative Q-learning
Jelle R. Kok and Nikos Vlassis, 2004
Discussion led by Shivaram Kalyanakrishnan, September 25, 2006.
Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains
Vishal Soni and Satinder Singh, 2006
Discussion led by Matt Taylor, September 11, 2006.

Summer 2006

PAC Model-Free Reinforcement Learning
Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford and Michael L. Littman, 2006
Discussion led by Shimon Whiteson, August 15, 2006.

Spring 2006

State Space Reduction for Autonomous Reinforcement Learning
Mehran Asadi and Manfred Huber, 2004
Accelerating Action Dependent Hierarchical Reinforcement Learning Through Autonomous Subgoal Discovery
Mehran Asadi and Manfred Huber, 2005
Discussion led by Nick Jong, April 28, 2006.
Autonomous Helicopter Flight via Reinforcement Learning
Andrew Y. Ng, H. Jin Kim, Michael I. Jordan and Shankar Sastry, 2004
Discussion led by Shivaram Kalyanakrishnan, April 14, 2006.
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Richard S. Sutton, David McAllester, Satinder Singh and Yishay Mansour, 2000
Discussion led by Shimon Whiteson, March 31, 2006.
Building Portable Options: Skill Transfer in Reinforcement Learning
George Konidaris and Andrew Barto, 2006
Discussion led by Matt Taylor, March 10, 2006.
CBR for State Value Function Approximation in Reinforcement Learning
Thomas Gabel and Martin Riedmiller, 2005
Discussion led by Nick Jong, February 24, 2006.
Why (PO)MDPs Lose for Spatial Tasks and What to Do About It
Terran Lane and William D. Smart, 2005
Discussion led by Shivaram Kalyanakrishnan, February 10, 2006.
Temporal-Difference Networks
Richard S. Sutton and Brian Tanner, 2005
Discussion led by Bikram Banerjee, January 27, 2006.

Fall 2005

Developing navigation behavior through self-organizing distinctive state abstraction
Jefferson Provost, Benjamin J. Kuipers and Risto Miikkulainen, 2006
Discussion led by Jeff Provost, December 14, 2005.
An Algorithmic Description of XCS
Martin V. Butz and Stewart W. Wilson, 2000
Discussion led by David Pardoe, December 2, 2005.
Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method
Martin Riedmiller, 2005
A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm
Martin Riedmiller and Heinrich Braun, 1993
Discussion led by Shivaram Kalyanakrishnan, November 18, 2005.
The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State Spaces
Andrew W. Moore and Christopher G. Atkeson, 1995
Discussion led by Nick Jong, November 4, 2005.
Samuel Meets Amarel: Automating Value Function Approximation using Global State Space Analysis
Sridhar Mahadevan, 2005
Discussion led by Jeff Provost, October 14, 2005.
Near-Optimal Reinforcement Learning in Polynomial Time
Michael Kearns and Satinder Singh, 1998
Discussion led by Yaxin Liu, September 23, 2005.

Summer 2005

An Empirical Evaluation of Interval Estimation for Markov Decision Processes
Alexander L. Strehl and Michael L. Littman, 2004
Discussion led by Shimon Whiteson, August 26, 2005.
Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another
Lisa Torrey, Trevor Walker, Jude Shavlik and Richard Maclin, 2005
Discussion led by Matt Taylor, August 12, 2005.
Guiding Inference through Relational Reinforcement Learning
Nima Asgharbeygi, Negin Nejati, Pat Langley and Sachiyo Arai, 2005
Discussion led by Matt Taylor, June 17, 2005.
Least-Squares Policy Iteration
Michail G. Lagoudakis and Ronald Parr, 2003
Discussion led by Lily Mihalkova, May 27, 2005.
Relational Reinforcement Learning
Sašo Džeroski, Luc De Raedt and Kurt Driessens, 2001
Discussion led by Greg Kuhlmann, May 6, 2005.

Spring 2005

Intrinsically Motivated Reinforcement Learning
Satinder Singh, Andrew G. Barto and Nuttapong Chentanez, 2005
Discussion led by Nick Jong, April 22, 2005.
Gradient Descent for General Reinforcement Learning
Leemon Baird and Andrew Moore, 1999
Discussion led by Shimon Whiteson, April 8, 2005.
Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks
Chris Drummond, 2002
Discussion led by Matt Taylor, March 25, 2005.
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
Michael Kearns, Yishay Mansour and Andrew Y. Ng, 2001
Discussion led by David Pardoe, March 4, 2005.
No Free Lunch Theorems for Optimization
David H. Wolpert and William G. Macready, 1996
Discussion led by Shimon Whiteson, February 18, 2005.
Common Myths and Misstatements about Reinforcement Learning
Various Authors, 1999 onwards
Discussion led by Matt Taylor, February 4, 2005.

Fall 2004

Reinforcement Learning as Classification: Leveraging Modern Classifiers
Michail G. Lagoudakis and Ronald Parr, 2003
Discussion led by Nick Jong, November 29, 2004.
Discovering Hierarchy in Reinforcement Learning with HEXQ
Bernhard Hengst, 2002
Discussion led by Mazda Ahmadi, 15 November, 2004.
Synthesizing Policy Search and Temporal Difference Methods for Reinforcement Learning
Discussion led by Shimon Whiteson , November 1, 2004.
Implicit Negotiation in Repeated Games
Michael L. Littman and Peter Stone, 2001
Discussion led by Matt Taylor, October 25, 2004.
Markov Games as a Framework for Multi-Agent Reinforcement Learning
Michael L. Littman, 1994
Discussion led by David Pardoe, October 11, 2004.
RL Methodology
Discussion led by Lily Mihalkova, September 20, 2004.

Summer 2004

Three Automated Stock-Trading Agents: A Comparative Study
Alexander A. Sherstov and Peter Stone, 2004
Practice talk by Sasha Sherstov, July 9, 2004.
Bidding for Customer Orders in TAC SCM
David Pardoe and Peter Stone, 2004
Practice talk by David Pardoe, July 9, 2004.
The MAXQ Method for Hierarchical Reinforcement Learning
Thomas G. Dietterich, 1998
Reinforcement Learning: A Survey, Section 6
Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore, 1996
Discussion led by Jeff Provost.
Acting Optimally in Partially Observable Stochastic Domains
Anthony R. Cassandra, Leslie Pack Kaelbling and Michael L. Littman, 1994
Discussion led by Shimon Whiteson, June 11, 2004.
Reinforcement Learning: A Survey, Sections 4 and 5
Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore, 1996
Reinforcement Learning: An Introduction, Chapter 9
Richard S. Sutton and Andrew G. Barto, 1998
Discussion on Model free and model based learning led by Peggy Fidelman, May 29, 2004.
Residual Algorithms: Reinforcement Learning with Function Approximation
Leemon Baird, 1995
Discussion led by Lily Mihalkova, May 14, 2004.

Spring 2004

A Quantitative Study of Hypothesis Selection
Philip W. L. Fong, 1995
Discussion led by Nick Jong, April 23, 2004.
Machine Learning for Fast Quadrupedal Locomotion
Nate Kohl and Peter Stone, 2004
Practice Talk by Nate Kohl, April 16, 2004.
Policy invariance under reward transformations: Theory and application to reward shaping
Andrew Y. Ng, Daishi Harada and Stuart Russell, 1999
Discussion led by Greg Kuhlmann, April 9, 2004.
Reinforcement Learning with Replacing Eligibility Traces
Satinder P. Singh and Richard S. Sutton, 1996
Discussion led by David Pardoe, March 26, 2004.
Learning to Predict by the Methods of Temporal Differences
Richard S. Sutton, 1998
Discussion led by Sasha Sherstov, March 5, 2004.
On the Complexity of Solving Markov Decision Problems
Michael L. Littman, Thomas L. Dean and Leslie Pack Kaelbling, 1995
Discussion led by Matt Taylor, February 20, 2004.
Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
Juan C. Santamaría, Richard S. Sutton and Ashwin Ram, 1998
Discussion led by Nick Jong, February 6, 2004.
Reinforcement Learning: A Survey
Leslie Pack Kaelbling, Michael L. Littman and Andrew W. Moore, 1996
General background discussion, January 23, 2004.

Please report any broken links or inconsistencies to Ishan Durugkar.