Ensembling Visual Explanations for VQA

Ensembling Visual Explanations for VQA (2017)

Nazneen Fatema Rajani, Raymond J. Mooney

Explanations make AI systems more transparent and also justify their predictions. The top-ranked Visual Question Answering (VQA) systems are ensembles of multiple systems; however, there has been no work on generating explanations for such ensembles. In this paper, we propose different methods for ensembling visual explanations for VQA using the localization maps of the component systems. Our crowd-sourced human evaluation indicates that our ensemble visual explanation is superior to each of the individual system’s visual explanation, although the results vary depending on the individual system that the ensemble is compared against as well as the number of individual systems that agree with the ensemble model’s answer. Overall, our ensemble explanation is better 63% of the time when compared to any individual system’s explanation. Our algorithm is also efficient and scales linearly in the number of component systems in the ensemble.

View:

PDF

Citation:

In Proceedings of the NIPS 2017 workshop on Visually-Grounded Interaction and Language (ViGIL), December 2017.

Bibtex:

Presentation:

Poster

People

Raymond J. Mooney	Faculty	mooney [at] cs utexas edu
Nazneen Rajani	Ph.D. Alumni	nrajani [at] cs utexas edu

Areas of Interest

Ensemble Learning Explainable AI Language and Vision

Labs

Machine Learning