UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs (2026)
Chi Zhang, Wenxuan Ding, Jiale Liu, Mingrui Wu, Qingyun Wu and Ray Mooney
Vision-Language Models (VLMs) have shown strong multimodal reasoning capabilities on Visual-Question-Answering (VQA) benchmarks. However, their robustness against textual misinformation remains under-explored. While existing research has studied the effect of misinformation in text-only domains, it is not clear how VLMs arbitrate between contradictory information from different modalities. To bridge the gap, we first propose the CONTEXT-VQA (i.e., Conflicting Text) dataset, consisting of image-question pairs together with system-atically generated persuasive prompts that deliberately conflict with visual evidence. Then, a thorough evaluation framework is designed and executed to benchmark the susceptibility of various models to these conflicting multimodal inputs. Comprehensive experiments over 11 state-of-the-art VLMs reveal that these models are indeed vulnerable to misleading textual prompts, often overriding clear visual evidence in favor of the conflicting text, and show an average performance drop of over 48.2 percent after only one round of persuasive conversation. Our findings highlight a critical limitation in current VLMs and underscore the need for improved robustness against textual manipulation.
View:
PDF
Citation:
European Chapter of the Association for Computational Linguistics (EACL)
(2026).
Bibtex:
@article{zhang:eacl26, title={Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs}, author={Chi Zhang and Wenxuan Ding and Jiale Liu and Mingrui Wu and Qingyun Wu and Ray Mooney}, booktitle={European Chapter of the Association for Computational Linguistics (EACL)}, month={March}, url="http://www.cs.utexas.edu/users/ai-labpub-view.php?PubID=128144", year={2026} }
People
Raymond J. Mooney
Faculty
mooney [at] cs utexas edu
Areas of Interest
Deep Learning
Language and Vision
Machine Learning
Labs
Machine Learning