Learning to Sportscast: A Test of Grounded Language Acquisition

University of Texas at Austin
Department of Computer Sciences
David L. Chen, Joohyun Kim, and Raymond J. Mooney


Sample commentaries made by our system

[ Description | Demo | Publication and Talks | Data | Contact ]

Description of the project

Back to Top

Current state-of-the-art language learners require annotated corpora as training data. However, constructing such corpora is difficult and time-consuming. On the other hand, children acquire language through exposure to linguistic input in the context of a rich, relevant, perceptual environment. By connecting words and phrases to objects and events in the world, the semantics of language is grounded in perceptual experience (Harnad, 1990). Ideally, a machine learning system could learn language in a similar manner. Our ultimate goal is to build a system that can exploit the large amount of linguistic data available naturally in the world with minimal supervision.

Although there has been some interesting computational work in grounded language learning (Roy, 2002; Bailey et al., 1997; Yu & Ballard, 2004), most of the focus has been on dealing with raw perceptual data and the complexity of the language involved has been very modest. To help make progress, we study the problem in a simulated environment that retains many of the important properties of a dynamic world with multiple agents and actions while avoiding many of the complexities of robotics and vision. Specifically, we use the Robocup simulator which provides a fairly detailed physical simulation of robot soccer. Our immediate goal is to build a system that learns to semantically interpret and generate language in the Robocup soccer domain by observing an on-going commentary of the game paired with the dynamic simulator state. While several groups have constructed Robocup commentator systems (Andre et al., 2000) that provide a textual natural-language (NL) transcript of the simulated game, their systems use manually-developed templates and are incapable of learning.

Demo

Back to Top

Below are sample sportscasts produced by our system. We also included commentaries made by our human commentators for comparison.

[English Version]

[Korean Version]

Steps for creating the demo clip:

  1. A rule-based system is used to extract game events from the Robocup game logs
  2. The strategic generation component of our system (trained using Iterative Generation Strategy Learning) is used to select important events to comment on
  3. The tactical generation component of our system (trained using WASPER-GEN) is used to generate natural language descriptions of the events selected
  4. FreeTTS (for English) and TextAloud (for Korean) is used to synthesize speech from the textual outputs

Publication and Talks

Back to Top

Generative Alignment and Semantic Parsing for Learning from Ambiguous Supervision [PDF]
Joohyun Kim and Raymond J. Mooney
In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 543-551, Beijing, China, August 2010

Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language [Abstract] [PDF] [JAIR link]
David L. Chen, Joohyun Kim, Raymond J. Mooney
In Journal of Artificial Intelligence Research (JAIR) , 37, pages 397-435, 2010

Using Perceptual Context to Ground Language [slides (PDF)]
David Chen
IBM Statistical Machine Learning and Its Application (SMiLe) Workshop, October 2009

Learning to Sportscast: A Test of Grounded Language Acquisition [slides (PPT)] [slides print-version (PPT)]
David L. Chen
Research Preparation Exam, Department of Computer Sciences, The University of Texas at Austin, August 2008

Learning to Sportscast: A Test of Grounded Language Acquisition [Abstract] [PDF] [slides (PPT)] [poster (PDF)] [talk (video)]
David L. Chen and Raymond J. Mooney
In Proceedings of the 25th International Conference on Machine Learning (ICML) , Helsinki, Finland, July 2008.

Data

Back to Top

Overview

We have collected human commentaries for the championship games of Robocup simulation league (www.robocup.org) from year 2001 to 2004. The commentators typed their comments into a text box, which were recorded with a timestamp. Together with the original game log files one can replay the games with these commentaries.

We developed a symbolic representation of game events to simulate perception. Most of these events involve actions with the ball, such as kicking and passing, but also include other game information such as whether the current playmode is kickoff, offside, or corner kick. These events are automatically extracted from the game logs using a rule-based system. The events are represented as atomic formulas in predicate logic with timestamps. These constitute our MRs. We manually developed a context-free grammar for this formal semantic language. Note that the use of English words for predicates and constants in the MR is for human readability only, our system treats these as arbitrary conceptual tokens and must learn their connection to English words.

Finally, we established a correspondence between the natural language commentaries and the game events. Each comment is paired with all of the events that occurred five seconds or less before the comment was made. Notice that this is a very coarse estimate and do not always capture the correct correspondences. Moreover, many sentences cannot be represented by our MR language. For evaluation purposes only, a gold-standard matching was produced by examining each comment manually and selecting the correct MR if it exists.

Citations

Please use the following citations when referencing the sources of the data:

[English]

@InProceedings{chen:icml08,
  title = "Learning to Sportscast: A Test of Grounded Language Acquisition",
  author = "David L. Chen and Raymond J. Mooney",
  booktitle = "Proceedings of 25th International Conference on Machine
                 Learning (ICML-2008)",
  address = "Helsinki, Finland",
  month = "July",
  year = 2008
} 
[Korean]
@Article{chen:jair10,
  author =       "David L. Chen, Joohyun Kim, Raymond J. Mooney",
  title =        "Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language",
  journal =      "Journal of Artificial Intelligence Research",
  volume =       "37",
  pages =        "397--435",
  year =         "2010",
}

Downloads

Here are compressed tarballs of all the data files:

[English] : data.tar.gz
[Korean] : data-kr.tar.gz

Or you can browse and download individual components from the following directories:

[English] : Data directory
[Korean] : Data directory

Contact Information

Back to Top

If you have any questions or comments, please contact David Chen or Joohyun Kim

If you are interested in reading more literature in this area, check out our reading group CLAMP