• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
Machine versus Human Attention in Deep Reinforcement Learning Tasks.
Sihang Guo, Ruohan
Zhang, Bo Liu, Yifeng Zhu,
Mary Hayhoe, Dana Ballard, and Peter
Stone.
In Conference on Neural Information Processing Systems (NeurIPS), December 2021.
Deep reinforcement learning (RL) algorithms are powerful tools for solvingvisuomotor decision tasks. However, the trained models are often difficultto interpret, because they are represented as end-to-end deep neuralnetworks. In this paper, we shed light on the inner workings of suchtrained models by analyzing the pixels that they attend to during task execution, and comparing them with the pixels attended to by humans executing the same tasks. To this end, we investigate the following two questions that, to the best of our knowledge, have not been previously studied. 1) How similar are the visual representations learned by RL agents and humans when performing the same task? and, 2) How do similarities and differences in these learned representations explain RL agents’ performance on these tasks? Specifically, we compare the saliency maps of RL agents against visual attention models of human experts when learning to play Atari games. Further, we analyze how hyperparameters of the deep RL algorithm affect the learned representations and saliency maps of the trained agents. The insights provided have the potential to inform novel algorithms for closing the performance gap between human experts and RL agents.
@InProceedings{NeurIPS2021-Guo,
author = {Sihang Guo and Ruohan Zhang and Bo Liu and Yifeng Zhu and Mary Hayhoe and Dana Ballard and Peter Stone},
title = {Machine versus Human Attention in Deep Reinforcement Learning Tasks},
booktitle = {Conference on Neural Information Processing Systems (NeurIPS)},
location = {Virtual Only},
month = {December},
year = {2021},
abstract = {
Deep reinforcement learning (RL) algorithms are powerful tools for solving
visuomotor decision tasks. However, the trained models are often difficult
to interpret, because they are represented as end-to-end deep neural
networks. In this paper, we shed light on the inner workings of such
trained models by analyzing the pixels that they attend to during task
execution, and comparing them with the pixels attended to by humans
executing the same tasks. To this end, we investigate the following
two questions that, to the best of our knowledge, have not been previously
studied. 1) How similar are the visual representations learned by RL agents
and humans when performing the same task? and, 2) How do similarities
and differences in these learned representations explain RL agentsâ
performance on these tasks? Specifically, we compare the saliency
maps of RL agents against visual attention models of human experts when
learning to play Atari games. Further, we analyze how hyperparameters of
the deep RL algorithm affect the learned representations and saliency maps
of the trained agents. The insights provided have the potential to inform
novel algorithms for closing the performance gap between human experts and
RL agents.
},
}
Generated by bib2html.pl (written by Patrick Riley ) on Sat Nov 15, 2025 21:30:13