UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
Reasoning about Actions with Large Multimodal Models (2025)
Vanya Cohen
Large multimodal models have become central for solving sequential decision-making tasks, enabling improved learning in diverse areas such as home robotics and automated software development. However, leveraging these models for sequential decision-making requires robust action reasoning capabilities, which remain a significant challenge. This thesis aims to improve and evaluate action reasoning in large multimodal models. First, we introduce a method to improve the parsing of instructional texts into action sequences by integrating external symbolic planners and planning domains during autoregressive language model decoding. Next, we propose a method that leverages the compositional structure of language instructions to improve generalization and sample efficiency of acquiring new tasks with reinforcement learning. Last, we propose a new benchmark to evaluate the understanding of dependencies between actions described in instructional texts. Future work will focus on evaluating the world modeling limitations of frontier models. Current models struggle to reason about the effects of actions in multimodal entity state tracking tasks. We aim to extend entity state tracking evaluations to embodied domains. From this benchmark we derive a post-training method for improving the entity-state reasoning abilities of language models. Together these contributions enhance the understanding of how models reason about actions and provide insights toward their improvement for real-world sequential decision-making problems.
View:
PDF
Citation:
Ph.D. Proposal.
Bibtex:
@misc{cohen:proposal25, title={Reasoning about Actions with Large Multimodal Models}, author={Vanya Cohen}, month={October}, note={Ph.D. Proposal}, url="http://www.cs.utexas.edu/users/ai-labpub-view.php?PubID=128139", year={2025} }
Presentation:
Slides (PDF)
People
Vanya Cohen
Ph.D. Student
vanya [at] utexas edu
Areas of Interest
Connecting Language and Perception
Deep Learning
Language and Robotics
Learning for Planning and Problem Solving
Machine Learning
Neural-Symbolic Learning
Reinforcement Learning
Labs
Machine Learning