PixL2R: Guiding Reinforcement Learning using Natural Language by Mapping Pixels to Rewards (2020)
Reinforcement learning (RL), particularly in sparse reward settings, often requires prohibitively large numbers of interactions with the environment, thereby limiting its applicability to complex problems. To address this, several prior approaches have used natural language to guide the agent's exploration. However, these approaches typically operate on structured representations of the environment, and/or assume some structure in the natural language commands. In this work, we propose a model that directly maps pixels to rewards, given a free-form natural language description of the task, which can then be used for policy training. Our experiments on the Meta-World robot manipulation domain show that language-based rewards significantly improve learning. Further, we analyze the resulting framework using multiple ablation experiments to better understand the nature of these improvements.
PDF, Arxiv
In 4th Conference on Robot Learning (CoRL), November 2020. Also presented on the 1st Language in Reinforcement Learning (LaReL) Workshop at ICML, July 2020 (Best Paper Award), the 6th Deep Reinforcement Learning Workshop at Neural Information Processing Systems (NeurIPS), Dec 2020.

Prasoon Goyal Ph.D. Student pgoyal [at] cs utexas edu
Prasoon Goyal Ph.D. Student pgoyal [at] cs utexas edu
Raymond J. Mooney Faculty mooney [at] cs utexas edu
Scott Niekum Faculty sniekum [at] cs utexas edu