CS 395T:
Grounded Natural Language Processing

How to read research articles

Background papers recommended by Matt Lease.
  1. S. Keshav. How to Read a Paper. U. Waterloo, February 17, 2016.
  2. Alan Smith. 1990. The Task of the Referee.

Research Papers

Papers to be read and presented by students. Papers for "pair" presentations from 2 students start with a "[2]". A presentation date is given at the beginning for each paper.
  1. Harnad, S., The Symbol Grounding Problem Physica D 42: 335-346, 1990.
  2. [2] Tadas Baltrusaitis, Chaitanya Ahuja, Louis-Philippe Morency, Multimodal Machine Learning: A Survey and Taxonomy, 2017.
  3. Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers, Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions, AAAI, 2006.
  4. David L. Chen and Raymond J. Mooney, Learning to Interpret Natural Language Navigation Instructions from Observations, In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI), 859-865, August 2011.
  5. Hongyuan Mei, Mohit Bansal, and Matthew R. Walter. Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences, In Proceedings of the National Conference on Artificial Intelligence (AAAI), 2016.
  6. Andrea F. Daniele, Mohit Bansal, and Matthew R. Walter. Navigational Instruction Generation as Inverse Reinforcement Learning with Neural Machine Translation, In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2017.
  7. Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3674-3683
  8. [2] Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi, TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments, CVPR, 2019
  9. [2] Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox, ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks, CVPR 2020.
  10. [2] Md. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, Hamid Laga, A Comprehensive Survey of Deep Learning for Image Captioning, ACM Computing Surveys (October 2018).
  11. Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang. No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), full paper, Melbourne, Australia, July 15-20, 2018
  12. Subhashini Venugopalan and Marcus Rohrbach and Jeff Donahue and Raymond J. Mooney and Trevor Darrell and Kate Saenko, Sequence to Sequence -- Video to Text, In Proceedings of the 2015 International Conference on Computer Vision (ICCV-15), Santiago, Chile, December 2015.
  13. Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, and William Yang Wang, VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research , Proceedings of the 17th CVF/IEEE International Conference on Computer Vision (ICCV 2019), Seoul, Korea.
  14. [2] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, VQA: Visual Question Answering, International Conference on Computer Vision (ICCV), 2015.
  15. Jacob Andreas, Marcus Rohrbach, Trevor Darrell, Dan Klein, Learning to Compose Neural Networks for Question Answering, NAACL 2016.
  16. Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang, Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering, CVPR, 2018.
  17. Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal, TVQA+: Spatio-Temporal Grounding for Video Question Answering>, ACL 2020.
  18. Sahar Kazemzadeh, Vicente Ordonez, Mark Matten, and Tamara Berg. Referitgame: Referring to objects in photographs of natural scenes. In EMNLP, 2014.
  19. Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, and Tamara L Berg. Mattnet: Modular attention network for referring expression comprehension. In CVPR, 2018.
  20. Xintong Yu, Hongming Zhang, Yangqiu Song, Yan Song, and Changshui Zhang, What You See is What You Get:Visual Pronoun Coreference Resolution in Dialogues, EMNLP, 2019
  21. [2] Carina Silberer, Vittorio Ferrari, and Mirella Lapata. Visually Grounded Meaning Representations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:11, 2284--2297, 2017.
  22. Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel. A Survey of Reinforcement Learning Informed by Natural Language. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019).
  23. Jesse Thomason and Jivko Sinapov and Maxwell Svetlik and Peter Stone and Raymond J. Mooney, Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy", In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI-16), 3477--3483, New York City, 2016.
  24. Angel Chang, Will Monroe, Manolis Savva, Christopher Potts, Christopher D. Manning, Text to 3D Scene Generation with Rich Lexical Grounding, ACL 2015.
  25. Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee, Generative Adversarial Text to Image Synthesis, ICML 2016.
  26. [2] David Harwath, Adria Recasens, Didac Suris, Galen Chuang, Antonio Torralba, and James Glass, Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input, Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  27. Haoyue Shi, Jiayuan Mao, Kevin Gimpel, Karen Livescu, Visually Grounded Neural Syntax Acquisition , ACL, 2019
  28. Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks, NeurIPS, 2019.
  29. Zhe Gan, Yen-Chun Chen, Linjie Li, Chen Zhu, Yu Cheng, Jingjing Liu, Large-Scale Adversarial Training for Vision-and-Language Representation Learning, NeurIPS, 2020.
  30. Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele, Trevor Darrell, Generating Visual Explanations, European Conference on Computer Vision (ECCV), 2016.
  31. Ronghang Hu, Jacob Andreas, Trevor Darrell, Kate Saenko, Explainable Neural Computation via Stack Neural Module Networks, ECCV, 2018.
  32. Jialin Wu and Raymond J. Mooney, Faithful Multimodal Explanation for Visual Question Answering, Proceedings of the Second BlackboxNLP Workshop at ACL, pp. 103-112, Florence, Italy, August 2019.
  33. Weixin Liang, James Zou, Zhou Yu, ALICE: Active Learning with Contrastive Natural Language Explanations, EMNLP 2020

Class Project Presentations

    TBA