• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia.
Chandler Smith,
Marwa Abdulhai, Manfred Diaz, Marko Tesic, Rakshit S. Trivedi, Alexander Sasha Vezhnevets, Lewis Hammond, Jesse Clifton, Minsuk
Chang, Edgar A. Duéñez-Guzmán, John P. Agapiou, Jayd Matyas, Danny Karmon, Akash Kundu, Aliaksei Korshuk, Ananya Ananya,
Arrasy Rahman, Avinaash Anand Kulandaivel, Bain McHale, Beining Zhang, Buyantuev
Alexander, Carlos Saith Rodriguez Rojas, Caroline Wang, Chetan Talele, Chenao
Liu, Chichen Lin, Diana Riazi, Di Yang Shi, Emanuel Tewolde, Elizaveta Tennant,
Fangwei Zhong, Fuyang Cui, Gang Zhao, Gema Parreño Piqueras, Hyeonggeun Yun, Ilya
Makarov, Jiaxun Cui, Jebish Purbey, Jim Dilkes, Jord Nguyen, Lingyun
Xiao, Luis Felipe Giraldo, Manuela Chacon-Chamorro, Manuel Sebastian Rios Beltran, Marta Emili García Segura, Mengmeng
Wang, Mogtaba Alim, Nicanor Quijano, Nico Schiavone, Olivia Macmillan-Scott, Oswaldo Peña, Peter
Stone, Ram Mohan Rao Kadiyala, Rolando Fernandez, Ruben Manrique, Sunjia Lu, Sheila A. McIlraith, Shamika Dhuri, Shuqing
Shi, Siddhant Gupta, Sneheel Sarangi, Sriram Ganapathi Subramanian, Taehun Cha, Toryn Q. Klassen, Wenming Tu, Weijian Fan,
Wu Ruiyang, Xue Feng, Yali Du, Yang Liu, Yiding Wang, Yipeng Kang, Yoonchang Sung,
Yuxuan Chen, Zhaowei Zhang, Zhihan Wang, Zhiqiang Wu, Ziang
Chen, Zilong Zheng, Zixia Jia, Ziyan Wang, Dylan Hadfield-Menell, Natasha Jaques, Tim Baarslag, Jose Hernandez-Orallo,
and Joel Z. Leibo.
In Conference on Neural Information Processing Systems, September 2025.
Large Language Model (LLM) agents have demonstrated impressive capabilities forsocial interaction and are increasingly being deployed in situations where theymight engage with both human and artificial agents. These interactions representa critical frontier for LLM-based agents, yet existing evaluation methods fail tomeasure how well these capabilities generalize to novel social situations. Inthis paper, we introduce a method for evaluating the ability of LLM-based agentsto cooperate in zero-shot, mixed-motive environments using Concordia, a naturallanguage multi-agent simulation environment. Our method measures generalcooperative intelligence by testing an agent's ability to identify and exploitopportunities for mutual gain across diverse partners and contexts. We presentempirical results from the NeurIPS 2024 Concordia Contest, where agents wereevaluated on their ability to achieve mutual gains across a suite of diversescenarios ranging from negotiation to collective action problems. Our findingsreveal significant gaps between current agent capabilities and the robustgeneralization required for reliable cooperation, particularly in scenariosdemanding persuasion and norm enforcement.
@InProceedings{smith2025evaluating,
author = {Chandler Smith and Marwa Abdulhai and Manfred Diaz and Marko Tesic and Rakshit S. Trivedi and Alexander Sasha Vezhnevets and Lewis Hammond and Jesse Clifton and Minsuk Chang and Edgar A. Duéñez-Guzmán and John P. Agapiou and Jayd Matyas and Danny Karmon and Akash Kundu and Aliaksei Korshuk and Ananya Ananya and Arrasy Rahman and Avinaash Anand Kulandaivel and Bain McHale and Beining Zhang and Buyantuev Alexander and Carlos Saith Rodriguez Rojas and Caroline Wang and Chetan Talele and Chenao Liu and Chichen Lin and Diana Riazi and Di Yang Shi and Emanuel Tewolde and Elizaveta Tennant and Fangwei Zhong and Fuyang Cui and Gang Zhao and Gema Parreño Piqueras and Hyeonggeun Yun and Ilya Makarov and Jiaxun Cui and Jebish Purbey and Jim Dilkes and Jord Nguyen and Lingyun Xiao and Luis Felipe Giraldo and Manuela Chacon-Chamorro and Manuel Sebastian Rios Beltran and Marta Emili GarcÃa Segura and Mengmeng Wang and Mogtaba Alim and Nicanor Quijano and Nico Schiavone and Olivia Macmillan-Scott and Oswaldo Peña and Peter Stone and Ram Mohan Rao Kadiyala and Rolando Fernandez and Ruben Manrique and Sunjia Lu and Sheila A. McIlraith and Shamika Dhuri and Shuqing Shi and Siddhant Gupta and Sneheel Sarangi and Sriram Ganapathi Subramanian and Taehun Cha and Toryn Q. Klassen and Wenming Tu and Weijian Fan and Wu Ruiyang and Xue Feng and Yali Du and Yang Liu and Yiding Wang and Yipeng Kang and Yoonchang Sung and Yuxuan Chen and Zhaowei Zhang and Zhihan Wang and Zhiqiang Wu and Ziang Chen and Zilong Zheng and Zixia Jia and Ziyan Wang and Dylan Hadfield-Menell and Natasha Jaques and Tim Baarslag and Jose Hernandez-Orallo and Joel Z. Leibo},
title = {Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia},
booktitle = {Conference on Neural Information Processing Systems},
year = {2025},
month = {September},
location = { San Diego, United States},
abstract = {Large Language Model (LLM) agents have demonstrated impressive capabilities for
social interaction and are increasingly being deployed in situations where they
might engage with both human and artificial agents. These interactions represent
a critical frontier for LLM-based agents, yet existing evaluation methods fail to
measure how well these capabilities generalize to novel social situations. In
this paper, we introduce a method for evaluating the ability of LLM-based agents
to cooperate in zero-shot, mixed-motive environments using Concordia, a natural
language multi-agent simulation environment. Our method measures general
cooperative intelligence by testing an agent's ability to identify and exploit
opportunities for mutual gain across diverse partners and contexts. We present
empirical results from the NeurIPS 2024 Concordia Contest, where agents were
evaluated on their ability to achieve mutual gains across a suite of diverse
scenarios ranging from negotiation to collective action problems. Our findings
reveal significant gaps between current agent capabilities and the robust
generalization required for reliable cooperation, particularly in scenarios
demanding persuasion and norm enforcement.
},
}
Generated by bib2html.pl (written by Patrick Riley ) on Wed Dec 10, 2025 13:48:59