**Reinforcement Learning Reading Group** The Reinforcement Learning Reading Group at the University of Texas at Austin CS department is a student-run group that discusses research papers related to reinforcement learning. Ever since its first meeting in the spring of 2004, the group has served as a forum for students to discuss interesting research ideas in an informal setting. Meetings are usually held in the afternoon and refreshments are provided. Occasionally, the group hosts invited talks.

Group Photo — Group Meeting on 04/18/2025

The group is currently being coordinated by [Chang Shi](https://changshiraine.github.io/), [Siddhant Agarwal](https://agarwalsiddhant10.github.io/) and [Cevahir Koprulu](https://cevahir-koprulu.github.io/). The previous `glorious` coordinators are: * [Jiaxun Cui](https://cuijiaxun.github.io/) and [Harshit Sikchi](https://hari-sikchi.github.io/) (Fall 2022 - Spring 2025) * [Ishan Durugkar](https://idurugkar.github.io/) (Fall 2017 - Spring 2022) * [Elad Liebman](http://www.cs.utexas.edu/~eladlieb) (Fall 2012 - Spring 2019) * [Matthew Hausknecht](https://mhauskn.github.io/) (Fall 2011 - Fall 2012) * [Shivaram Kalyanakrishnan](https://www.cse.iitb.ac.in/~shivaram/) (Spring 2006 - Spring 2011) * [Matt Taylor](https://drmatttaylor.net/) (Spring 2004 - Fall 2005) This page provides information about group meetings. Also, it lists useful resources for reinforcement learning, and serves as a repository of all past readings. ## Want to join us? New members are always welcome! Interested students or researchers may also subscribe to the group e-mailing list.
To do so, mail Chang (`changshi _at_ utexas.edu`), Siddant (`siddhant _at_ cs.utexas.edu`) or Cevahir (`cevahir.koprulu _at_ utexas.edu`) and they can add you. The reading group has an e-mailing list (`rlreadinggroup@utlists.utexas.edu`) on which regular announcements are made. ## Next Meeting The meeting is scheduled from 2pm to 3pm on Fridays at GDC 3.516 in Fall 2025. You can also join through [Zoom](https://utexas.zoom.us/j/83792943808?pwd=ABHbLqw5NaSRdAOgbX9mommDoNKcC7.1). ## Fall 2025 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | Sept 5 |[Horizon Reduction makes RL scalable](https://arxiv.org/pdf/2506.04168)| Carl Qi | | Sept 19|[Efficient Online Reinforcement Learning for Diffusion Policy](https://openreview.net/forum?id=6Anv3KB9lz)| Da Liu | | Sept 26|[Real-Time Execution of Action Chunking Flow Policies](https://www.pi.website/research/real_time_chunking)| Kavin Black (UC Berkeley/Physical Intelligence)| | Oct 3| [Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination](https://arxiv.org/abs/2504.12714)| Cameron Angliss| | Oct 10|[Latent Diffusion Planning for Imitation Learning](https://arxiv.org/abs/2504.16925)| Cole Harrison | | Oct 17|[Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations](https://arxiv.org/abs/2412.14803) | Max Rudolph| Previous Discussions ============================================================== For a list of previous discussions, refer to the [older version](https://cs.utexas.edu/~rlrg/old_index.html) of this website. Below the papers discussed are arranged in a reverse chronological order ## Spring 2025 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | January 17 |[Distributional Successor Features Enable Zero-Shot Policy Optimization](https://arxiv.org/abs/2403.06328)| Chuning Zhu (U Washington) | | February 7 | Temporal Difference Flows | Jesse Farebrother (MILA) | | February 14 | [Goodhart’s Law in Reinforcement Learning](https://arxiv.org/pdf/2310.09144) | Jasmeet Kaur | | February 28 | [Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning](https://openreview.net/forum?id=xoIeVdFO7U) | Siddhant Agarwal | | March 7 | [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://arxiv.org/pdf/2402.03300) | Brett Barkley | | March 14 | [Rule-Bottleneck Reinforcement Learning: Joint Explanation and Decision Optimization for Resource Allocation with Language Agents](https://arxiv.org/abs/2502.10732) | Mauricio Tec (Harvard University) | | March 28 | [Tapered Off-Policy REINFORCE: Stable and Efficient Reinforcement Learning for LLMs](https://arxiv.org/abs/2503.14286) | Haoran Xu | | April 4 | [Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game](https://proceedings.mlr.press/v235/xu24ad.html) | Caroline Wang| | April 18 | [Towards General-Purpose Model-Free Reinforcement Learning]() | Harshit Sikchi | ## Fall 2024 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | September 13 |[Understanding and Diagnosing Deep Reinforcement Learning](https://50ad7ffc.streaklinks.com/CHNlBS29fSiG5RKHhw_n9IHx/https%3A%2F%2Fproceedings.mlr.press%2Fv235%2Fkorkmaz24a.html)| Caleb Chuck | | October 4 |[Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning](https://openreview.net/pdf?id=LjivA1SLZ6)| Zhihan Wang | | October 11 |[Let's Verify Step by Step](https://openreview.net/forum?id=v8L0pN6EOi)| Jiaxun Cui | | October 18 |[RL, but don't do anything I wouldn't do](https://arxiv.org/abs/2410.06213) | Siddhant Agarwal | | October 25 |[Synthetic Experience Replay](https://arxiv.org/abs/2303.06614) | Brett Barkley | | November 1 |[REBEL: Reinforcement Learning via Regressing Relative Rewards](https://arxiv.org/abs/2404.16767) | Harshit Sikchi | | November 8 |[Imitating Language via Scalable Inverse Reinforcement Learning](https://arxiv.org/abs/2409.01369) | Haoran Xu | | November 15 |[Acquiring Diverse Skills Using Curriculum Reinforcement Learning with a Mixture of Experts](https://arxiv.org/abs/2403.06966) | Cevahir Koprulu | | November 22 |[Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning (RL4VLM)](https://arxiv.org/abs/2405.10292) | Mike | ## Spring 2024 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | January 19 |[Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient](https://arxiv.org/abs/2210.06718)| Haoran Xu | | February 2 |[On the Impossibility of Learning to Cooperate with Adaptive Partner Strategies in Repeated Games](https://proceedings.mlr.press/v162/loftin22a.html)| Arrasy Rahman | | February 9 |[Contrastive Preference Learning: Learning from Human Feedback without RL](https://openreview.net/forum?id=iX1RjVQODj)| Joey Hejna (External) | | February 16 |[Multi-Agent Diagnostics for Robustness via Illuminated Diversity](https://arxiv.org/abs/2401.13460)| Harshit Sikchi | | February 23 |[METRA: Scalable Unsupervised RL with Metric-Aware Abstraction](https://openreview.net/forum?id=c5pwL0Soay)| Siddhant Agarwal | | March 08 |[Reward-Free Curricula for Training Robust World Models](https://openreview.net/forum?id=eCGpNGDeNu)| Cevahir Koprulu | | March 29 |[Stop Regressing: Training Value Functions via Classification for Scalable Deep RL](https://arxiv.org/abs/2403.03950)| Max Rudolph | | April 04 |[Provable Compositional Generalization for Object Centric Learning](https://openreview.net/attachment?id=7VPTUWkiDQ&name=pdf)| Caleb Chuck | | April 12 |[Multi-Agent Reinforcement Learning is A Sequence Modeling Problem](https://proceedings.neurips.cc/paper_files/paper/2022/file/69413f87e5a34897cd010ca698097d0a-Supplemental-Conference.pdf)| Caroline Wang | | April 19 |[Distributional Bellman Operators over Mean Embeddings](https://arxiv.org/abs/2312.07358)| Kevin Li (Deepmind) | | April 26 |[State-Action Similarity-Based Representations for Off-Policy Evaluation](https://arxiv.org/abs/2310.18409)| Brahma Pavse (UW Madison) | | May 03 |[Inference via Interpolation: Contrastive Representations Provably Enable Planning and Inference](https://arxiv.org/abs/2403.04082)| Braham Snyder | ## Fall 2023 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | September 1 |[Bridging RL Theory and Practice with the Effective Horizon](https://arxiv.org/pdf/2304.09853.pdf)| Harshit Sikchi | | September 8 |[Adversarial Policies Beat Superhuman Go AIs](https://goattack.far.ai/)| Jiaxun Cui | | September 15 |[Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning](https://icml.cc/virtual/2023/oral/25490)| Zizhao Wang | | September 22 |[Massively Scalable Inverse Reinforcement Learning in Google Maps](https://arxiv.org/pdf/2305.11290.pdf)| Chang Shi | | September 29 |[A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs](https://arxiv.org/pdf/2306.03236.pdf)| Zifan | | October 06 | [Champion-level drone racing using deep reinforcement learning](https://www.nature.com/articles/s41586-023-06419-4) | Srinath Tankasala| | October 13 | [Hierarchical Empowerment: Towards tractable empowerment-based skill learning](https://arxiv.org/pdf/2307.02728.pdf) | Caleb Chuck| | October 20 | [Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks](https://openreview.net/forum?id=oGDKSt9JrZi) | Siddhant Agarwal| | October 27 | [Foundation Models for Decision Making: Problems, Methods, and Opportunities](https://arxiv.org/abs/2303.04129) | Sherry Yang| | November 3| [Impossibly Good Experts and How to Follow Them](https://openreview.net/forum?id=sciA_xgYofB) | Jeff (Jiaheng)| | November 10 | [Guiding Pretraining in Reinforcement Learning with Large Language Models](https://arxiv.org/pdf/2302.06692.pdf) | Carl Qi| | November 17 | [The Dormant Neuron Phenomenon in Deep Reinforcement Learning](https://icml.cc/virtual/2023/oral/25564) | Max Rudolph| | December 8 | [Human-Timescale Adaptation in an Open-Ended Task Space](https://arxiv.org/abs/2301.07608) | Michael Munje| ## Spring 2023 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | Feb 8 |[Human-level play in the game of Diplomacy by combining language models with strategic reasoning](https://www.science.org/doi/10.1126/science.ade9097)| Jiaxun Cui | | Feb 22 |[Extreme Q-Learning: MaxEnt RL without Entropy](https://div99.github.io/XQL/)| Harshit Sikchi | | March 22 |[Deep Laplacian-based Options](https://arxiv.org/pdf/2301.11181.pdf) | Caleb Chuck | | March 29 |[Model-Based Uncertainty in Value Functions](https://arxiv.org/pdf/2302.12526.pdf) | Jasmeet Kaur | | April 5 |[Evolving Curricula with Regret-based Environment Design](https://proceedings.mlr.press/v162/parker-holder22a/parker-holder22a.pdf) | Zifan Xu | | April 19 |[DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical Reinforcement Learning](https://neurips.cc/virtual/2022/poster/53567) | Chang Shi | | April 26 (10 am)|[Game Theory x RL]() | Haobo Fu (Invited Speaker) | ## Fall 2022 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | Sep 7 | [Deep Radial-Basis Valuea Functions for Continuous Control](https://ojs.aaai.org/index.php/AAAI/article/view/16828) | Ishan Durugkar | | Sep 21 | [The Information Geometry of Unsupervised Reinforcement Learning](https://iclr.cc/virtual/2022/oral/6207) | Harshit Sikchi | | Oct 5 | [A Distributional Perspective on Reinforcement Learning](https://proceedings.mlr.press/v70/bellemare17a) | Caroline Wang, Jiaxun Cui | | Oct 19 | [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) | Srinath Tankasala | | Oct 26 | [Discovering Faster Matrix Multiplication Algorithms with Reinforcement Learning](https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor) | Ben Nativi | | Nov 2 | [Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning](https://arxiv.org/pdf/2207.09081.pdf)| Caleb Chuck | | Nov 16| [Bootstrapped Meta-Learning](https://arxiv.org/pdf/2109.04504.pdf) | Jasmeet Kaur | ## Spring 2022 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | Mar 28 | [DAC: Double Actor-Critic Architecture for Learning Options](https://arxiv.org/abs/1904.12691) | Sid Desai | | Mar 21 | [Discovery of Options via Meta-learned subgoals](https://openreview.net/forum?id=AADxnPG-PR) | Caleb Chuck | | Mar 7 | [On the Expressivity of Markov Reward](https://arxiv.org/abs/2111.00876) | Jiaxun Cui | | Feb 28 | [Deep Reinforcement Learning at the Edge of the Statistical Precipice](https://arxiv.org/abs/2108.13264) | Elad Liebman | | Feb 21 | [Replacing Rewards with Examples: Example-Based Policy Search
via Recursive Classification](https://arxiv.org/abs/2103.12656) | Ishan Durugkar | | Feb 7 | [PEBBLE: Feedback-Efficient Interactive Reinforcement Learning
via Relabeling Experience and Unsupervised Pre-training](https://arxiv.org/abs/2106.05091) | Jordan Schneider | ## Fall 2021 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | Nov 29 | [Muesli: Combining Improvements in Policy Optimization](http://proceedings.mlr.press/v139/hessel21a/hessel21a.pdf) | Sai Kiran Narayanaswami | | Nov 8 | [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) | Bo Liu | | Nov 1 | [The Option Keyboard: Combining Skills in Reinforcement Learning](https://arxiv.org/abs/2106.13105) | Yulin Zhang | | Oct 18 | [Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation](http://proceedings.mlr.press/v139/dance21a.html) | Zifan Xu | | Oct 11 | [Imitation by Predicting Observations](https://arxiv.org/abs/2107.03851) | Mauricio Tec | | Sept 27 | [Counterfactual Credit Assignment in Model-Free Reinforcement Learning](http://proceedings.mlr.press/v139/mesnard21a.html) | Zizhao Wang | | Sept 20 | [UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2010.02974) | William Macke | | Sept 13 | [PsiPhi-Learning: Reinforcement Learning with Demonstrations
using Successor Features and Inverse Temporal Difference Learning](http://proceedings.mlr.press/v139/filos21a/filos21a.pdf) | Ishan Durugkar | ## Summer 2021 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | August 17 | [Phasic Policy Gradient](http://proceedings.mlr.press/v139/cobbe21a/cobbe21a.pdf) | Caroline Wang | | July 27 | [Reward is enough](https://www.sciencedirect.com/science/article/pii/S0004370221000862) | Sai Kiran Narayanaswami | ## Spring 2021 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | May 7 | [Robust Multi-Agent Reinforcement Learning with Model Uncertainty](https://nips.cc/virtual/2020/public/poster_774412967f19ea61d448977ad9749078.html) | Jiaxun Cui | | April 23 | [MOPO: Model-based Offline Policy Optimization](https://arxiv.org/abs/2005.13239) | Wonjoon Guo | | April 9 | [Discovering a set of policies for the worst case reward](https://openreview.net/forum?id=PUkhWz65dy5) | Bo Liu | | April 2 | [Novelty Search in Representational Space for Sample Efficient Exploration](https://arxiv.org/abs/2009.13579) | Sai Kiran Narayanaswami | | March 26 | [Autonomous navigation of stratospheric balloons using reinforcement learning](https://www.nature.com/articles/s41586-020-2939-8.epdf) | Haresh Karnan | | March 5 | [First return, then explore](https://www.nature.com/articles/s41586-020-03157-9) | Brad Knox | | Feb 26 | [Expected Eligibility Traces](https://www.aaai.org/AAAI21Papers/AAAI-10339.vanHasseltHP.pdf) | Reuth Mirsky | | Feb 12 | [Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design](https://arxiv.org/abs/2012.02096) | Sid Desai | | Jan 29 | [Discovering Reinforcement Learning Algorithms](https://arxiv.org/abs/2007.08794) | Ishan Durugkar | ## Fall 2020 | Date | Paper | Discussion Leader | |----------|:-------------:|------:| | Nov 20 | [The Value Equivalence Principle for Model-Based Reinforcement Learning](https://arxiv.org/abs/2011.03506) | Mauricio Tec | | Nov 13 | [Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills](https://arxiv.org/abs/2002.03647) | Caleb Chuck | | Oct 30 | [What can I do here? A Theory of Affordances in Reinforcement Learning](https://arxiv.org/abs/2006.15085) | Sid Desai | | Oct 23 | [Domain Adaptive Imitation Learning](https://proceedings.icml.cc/static/paper_files/icml/2020/1732-Paper.pdf) | Mingyo Seo | | Oct 16 | [ROMA: Multi-Agent Reinforcement Learning with Emergent Roles](https://arxiv.org/abs/2003.08039) | Ishan Durugkar | | Oct 9 | [CURL: Contrastive Unsupervised Representations for Reinforcement Learning](https://proceedings.icml.cc/static/paper_files/icml/2020/5951-Paper.pdf) | Haresh Karnan | | Oct 2 | [Intrinsic Reward Driven Imitation Learning via Generative Model](https://arxiv.org/abs/2006.15061) | Jordan Schneider | | Sept 25 | [Keypoints into the Future: Self-Supervised Correspondence in Model-Based Reinforcement Learning](https://arxiv.org/abs/2009.05085) | Yifeng Zhu | | Sept 18 | [Invariant Causal Prediction for Block MDPs](https://arxiv.org/abs/2003.06016) | Mauricio Tec | Usual Meeting Times ============================================================== The group will meet at 2 p.m. on *Fridays* at GDC 3.516.
Meeting time and place may change on rare occasion. The logo on the top right is taken from [this medium blog](https://medium.com/dataseries/how-deepmind-builds-more-efficient-multi-task-reinforcement-learning-agents-1cd017d46c25)