Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: 2015

  1. Stacked Ensembles of Information Extractors for Knowledge-Base Population by Combining Supervised and Unsupervised Approaches
    [Details] [PDF] [Slides (PDF)]
    Nazneen Fatema Rajani and Raymond J Mooney
    In Proceedings of the Eighth Text Analysis Conference (TAC 2015), November 2015.
    The UTAustin team participated in two main tasks this year - the Cold Start Slot Filling (CSSF) task and the Slot-Filler Validation/Ensembling task, which was divided into the filtering and ensembling subtasks. Our system uses stacking to ensemble multiple systems for the KBP slot filling task, as described in our ACL 2015 paper. We expand the stacking approach by allowing the classifier to also utilize additions features that are relevant to making a final decision. Stacking relies on supervised training and hence requires common systems from the 2014 data to be used as training. However, that approach has limitations on performance and therefore we propose a novel approach of combining the supervised approach with an unsupervised approach on the remaining systems. We believe this combination approach gives our best run for the ensembling task. In this paper, we also discuss strategies to handle Cold Start data which comes from multiple hops.
    ML ID: 355
  2. Statistical Script Learning with Recurrent Neural Nets
    [Details] [PDF] [Slides (PDF)]
    Karl Pichotta
    December 2015. PhD proposal, Department of Computer Science, The University of Texas at Austin.
    Statistical Scripts are probabilistic models of sequences of events. For example, a script model might encode the information that the event "Smith met with the President" should strongly predict the event "Smith spoke to the President." We present a number of results improving the state of the art of learning statistical scripts for inferring implicit events. First, we demonstrate that incorporating multiple arguments into events, yielding a more complex event representation than is used in previous work, helps to improve a co-occurrence-based script system's predictive power. Second, we improve on these results with a Recurrent Neural Network script sequence model which uses a Long Short-Term Memory component. We evaluate in two ways: first, we evaluate systems' ability to infer held-out events from documents (the "Narrative Cloze" evaluation); second, we evaluate novel event inferences by collecting human judgments.

    We propose a number of further extensions to this work. First, we propose a number of new probabilistic script models leveraging recent advances in Neural Network training. These include recurrent sequence models with different hidden unit structure and Convolutional Neural Network models. Second, we propose integrating more lexical and linguistic information into events. Third, we propose incorporating discourse relations between spans of text into event co-occurrence models, either as output by an off-the-shelf discourse parser or learned automatically. Finally, we propose investigating the interface between models of event co-occurrence and coreference resolution, in particular by integrating script information into general coreference systems.

    ML ID: 326
  3. Natural Language Video Description using Deep Recurrent Neural Networks
    [Details] [PDF] [Slides (PDF)]
    Subhashini Venugopalan
    November 2015. PhD proposal, Department of Computer Science, The University of Texas at Austin.
    For most people, watching a brief video and describing what happened (in words) is an easy task. For machines, extracting the meaning from video pixels and generating a sentence description is a very complex problem. The goal of my research is to develop models that can automatically generate natural language (NL) descriptions for events in videos. As a first step, this proposal presents deep recurrent neural network models for video to text generation. I build on recent "deep" machine learning approaches to develop video description models using a unified deep neural network with both convolutional and recurrent structure. This technique treats the video domain as another "language" and takes a machine translation approach using the deep network to translate videos to text. In my initial approach, I adapt a model that can learn on images and captions to transfer knowledge from this auxiliary task to generate descriptions for short video clips. Next, I present an end-to-end deep network that can jointly model a sequence of video frames and a sequence of words. The second part of the proposal outlines a set of models to significantly extend work in this area. Specifically, I propose techniques to integrate linguistic knowledge from plain text corpora; and attention methods to focus on objects and track their interactions to generate more diverse and accurate descriptions. To move beyond short video clips, I also outline models to process multi-activity movie videos, learning to jointly segment and describe coherent event sequences. I propose further extensions to take advantage of movie scripts and subtitle information to generate richer descriptions.
    ML ID: 324
  4. Knowledge Transfer Using Latent Variable Models
    [Details] [PDF] [Slides (PDF)]
    Ayan Acharya
    PhD Thesis, Department of Electrical and Computer Engineering, The University of Texas at Austin, August 2015.
    In several applications, scarcity of labeled data is a challenging problem that hinders the predictive capabilities of machine learning algorithms. Additionally, the distribution of the data changes over time, rendering models trained with older data less capable of discovering useful structure from the newly available data. Transfer learning is a convenient framework to overcome such problems where the learning of a model specific to a domain can benefit the learning of other models in other domains through either simultaneous training of domains or sequential transfer of knowledge from one domain to the others. This thesis explores the opportunities of knowledge transfer in the context of a few applications pertaining to object recognition from images, text analysis, network modeling and recommender systems, using probabilistic latent variable models as building blocks. Both simultaneous and sequential knowledge transfer are achieved through the latent variables, either by sharing these across multiple related domains (for simultaneous learning) or by adapting their distributions to fit data from a new domain (for sequential learning).
    ML ID: 322
  5. Inducing Grammars from Linguistic Universals and Realistic Amounts of Supervision
    [Details] [PDF]
    Dan Garrette
    PhD Thesis, Department of Computer Science, The University of Texas at Austin, 2015.
    The best performing NLP models to date are learned from large volumes of manually-annotated data. For tasks like part-of-speech tagging and grammatical parsing, high performance can be achieved with plentiful supervised data. However, such resources are extremely costly to produce, making them an unlikely option for building NLP tools in under-resourced languages or domains.

    This dissertation is concerned with reducing the annotation required to learn NLP models, with the goal of opening up the range of domains and languages to which NLP technologies may be applied. In this work, we explore the possibility of learning from a degree of supervision that is at or close to the amount that could reasonably be collected from annotators for a particular domain or language that currently has none. We show that just a small amount of annotation input — even that which can be collected in just a few hours — can provide enormous advantages if we have learning algorithms that can appropriately exploit it.

    This work presents new algorithms, models, and approaches designed to learn grammatical information from weak supervision. In particular, we look at ways of intersecting a variety of different forms of supervision in complementary ways, thus lowering the overall annotation burden. Sources of information include tag dictionaries, morphological analyzers, constituent bracketings, and partial tree annotations, as well as unannotated corpora. For example, we present algorithms that are able to combine faster-to-obtain type-level annotation with unannotated text to remove the need for slower-to-obtain token-level annotation.

    Much of this dissertation describes work on Combinatory Categorial Grammar (CCG), a grammatical formalism notable for its use of structured, logic-backed categories that describe how each word and constituent fits into the overall syntax of the sentence. This work shows how linguistic universals intrinsic to the CCG formalism itself can be encoded as Bayesian priors to improve learning.

    ML ID: 321
  6. A Supertag-Context Model for Weakly-Supervised CCG Parser Learning
    [Details] [PDF] [Slides (PDF)]
    Dan Garrette and Chris Dyer and Jason Baldridge and Noah A. Smith
    In Proceedings of the 2015 Conference on Computational Natural Language Learning (CoNLL-2015), 22--31, Beijing, China, 2015.
    Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that specify the syntactic configurations in which they may occur. We present a novel parsing model with the capacity to capture the associative adjacent-category relationships intrinsic to CCG by parameterizing the relationships between each constituent label and the preterminal categories directly to its left and right, biasing the model toward constituent categories that can combine with their contexts. This builds on the intuitions of Klein and Manning's (2002) "constituent-context" model, which demonstrated the value of modeling context, but has the advantage of being able to exploit the properties of CCG. Our experiments show that our model outperforms a baseline in which this context information is not captured.
    ML ID: 320
  7. Sequence to Sequence -- Video to Text
    [Details] [PDF]
    Subhashini Venugopalan and Marcus Rohrbach and Jeff Donahue and Raymond J. Mooney and Trevor Darrell and Kate Saenko
    In Proceedings of the 2015 International Conference on Computer Vision (ICCV-15), Santiago, Chile, December 2015.
    Real-world videos often have complex dynamics; and methods for generating open-domain video descriptions should be sensitive to temporal structure and allow both input (sequence of frames) and output (sequence of words) of variable length. To approach this problem, we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. For this we exploit recurrent neural networks, specifically LSTMs, which have demonstrated state-of-the-art performance in image caption generation. Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip. Our model naturally is able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. We evaluate several variants of our model that exploit different visual features on a standard set of YouTube videos and two movie description datasets (M-VAD and MPII-MD).
    ML ID: 319
  8. Stacked Ensembles of Information Extractors for Knowledge-Base Population
    [Details] [PDF] [Slides (PPT)]
    Vidhoon Viswanathan and Nazneen Fatema Rajani and Yinon Bentor and Raymond J. Mooney
    In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-15), 177-187, Beijing, China, July 2015.
    We present results on using stacking to ensemble multiple systems for the Knowledge Base Population English Slot Filling (KBP-ESF) task. In addition to using the output and confidence of each system as input to the stacked classifier, we also use features capturing how well the systems agree about the provenance of the information they extract. We demonstrate that our stacking approach outperforms the best system from the 2014 KBP-ESF competition as well as alternative ensembling methods employed in the 2014 KBP Slot Filler Validation task and several other ensembling baselines. Additionally, we demonstrate that including provenance information further increases the performance of stacking.
    ML ID: 318
  9. Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes
    [Details] [PDF] [Poster]
    Chris Quirk and Raymond Mooney and Michel Galley
    In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-15), 878--888, Beijing, China, July 2015.
    Using natural language to write programs is a touchstone problem for computational linguistics. We present an approach that learns to map natural-language descriptions of simple "if-then" rules to executable code. By training and testing on a large corpus of naturally-occurring programs (called "recipes") and their natural language descriptions, we demonstrate the ability to effectively map language to code. We compare a number of semantic parsing approaches on the highly noisy training data collected from ordinary users, and find that loosely synchronous systems perform best.
    ML ID: 317
  10. Knowledge Base Population using Stacked Ensembles of Information Extractors
    [Details] [PDF]
    Vidhoon Viswanathan
    Masters Thesis, Department of Computer Science, The University of Texas at Austin, May 2015.
    The performance of relation extractors plays a significant role in automatic creation of knowledge bases from web corpus. Using automated systems to create knowledge bases from web is known as Knowledge Base Population. Text Analysis Conference conducts English Slot Filling (ESF) and Slot Filler Validation (SFV) tasks as part of its KBP track to promote research in this area. Slot Filling systems are developed to do relation extraction for specific relation and entity types. Several participating universities have built Slot Filling systems addressing different aspects employing different algorithms and techniques for these tasks.

    In this thesis, we investigate the use of ensemble learning to combine the output of existing individual Slot Filling systems. We are the first to employ Stacking, a type of ensemble learning algorithm for the task of ensembling Slot Filling systems for the KBP ESF and SFV tasks. Our approach builds an ensemble classi- fier that learns to meaningfully combine output from different Slot Filling systems and predict the correctness of extractions. Our experimental evaluation proves that Stacking is useful for ensembling SF systems. We demonstrate new state-of-the-art results for KBP ESF task. Our proposed system achieves an F1 score of 47.

    Given the complexity of developing Slot Filling systems from scratch, our promising results indicate that performance on Slot Filling tasks can be increased by ensembling existing systems in shorter timeframe. Our work promotes research and investigation into other methods for ensembling Slot Filling systems.

    ML ID: 315
  11. Learning to Interpret Natural Language Commands through Human-Robot Dialog
    [Details] [PDF] [Slides (PDF)]
    Jesse Thomason and Shiqi Zhang and Raymond Mooney and Peter Stone
    In Proceedings of the 2015 International Joint Conference on Artificial Intelligence (IJCAI), 1923--1929, Buenos Aires, Argentina, July 2015.
    Intelligent robots frequently need to understand requests from naive users through natural language. Previous approaches either cannot account for language variation, e.g., keyword search, or require gathering large annotated corpora, which can be expensive and cannot adapt to new variation. We introduce a dialog agent for mobile robots that understands human instructions through semantic parsing, actively resolves ambiguities using a dialog manager, and incrementally learns from human-robot conversations by inducing training data from user paraphrases. Our dialog agent is implemented and tested both on a web interface with hundreds of users via Mechanical Turk and on a mobile robot over several days, tasked with understanding navigation and delivery requests through natural language in an office environment. In both contexts, We observe significant improvements in user satisfaction after learning from conversations.
    ML ID: 314
  12. Translating Videos to Natural Language Using Deep Recurrent Neural Networks
    [Details] [PDF] [Slides (PDF)]
    Subhashini Venugopalan and Huijuan Xu and Jeff Donahue and Marcus Rohrbach and Raymond Mooney and Kate Saenko
    In Proceedings the 2015 Conference of the North American Chapter of the Association for Computational Linguistics -- Human Language Technologies (NAACL HLT 2015), 1494--1504, Denver, Colorado, June 2015.
    Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep learning for natural language grounding in static images. In this paper, we propose to translate videos directly to sentences using a unified deep neural network with both convolutional and recurrent structure. Described video datasets are scarce, and most existing methods have been applied to toy domains with a small vocabulary of possible words. By transferring knowledge from 1.2M+ images with category labels and 100,000+ images with captions, our method is able to create sentence descriptions of open-domain videos with large vocabularies. We compare our approach with recent work using language generation metrics, subject, verb, and object prediction accuracy, and a human evaluation.
    ML ID: 313
  13. Unsupervised Code-Switching for Multilingual Historical Document Transcription
    [Details] [PDF] [Slides (PDF)]
    Dan Garrette and Hannah Alpert-Abrams and Taylor Berg-Kirkpatrick and Dan Klein
    In Proceedings the 2015 Conference of the North American Chapter of the Association for Computational Linguistics -- Human Language Technologies (NAACL HLT 2015), 1036--1041, Denver, Colorado, June 2015.
    Transcribing documents from the printing press era, a challenge in its own right, is more complicated when documents interleave multiple languages—a common feature of 16th century texts. Additionally, many of these documents precede consistent orthographic conventions, making the task even harder. We extend the state-of-the-art historical OCR model of Berg-Kirkpatrick et al. (2013) to handle word-level code-switching between multiple languages. Further, we enable our system to handle spelling variability, including now-obsolete shorthand systems used by printers. Our results show average relative character error reductions of 14% across a variety of historical texts.
    ML ID: 312
  14. On the Proper Treatment of Quantifiers in Probabilistic Logic Semantics
    [Details] [PDF] [Slides (PPT)]
    I. Beltagy and Katrin Erk
    In Proceedings of the 11th International Conference on Computational Semantics (IWCS-2015), London, UK, April 2015.
    As a format for describing the meaning of natural language sentences, probabilistic logic combines the expressivity of first-order logic with the ability to handle graded information in a principled fashion. But practical probabilistic logic frameworks usually assume a finite domain in which each entity corresponds to a constant in the logic (domain closure assumption). They also assume a closed world where everything has a very low prior probability. These assumptions lead to some problems in the inferences that these systems make. In this paper, we show how to formulate Textual Entailment (RTE) inference problems in probabilistic logic in a way that takes the domain closure and closed-world assumptions into account. We evaluate our proposed technique on three RTE datasets, on a synthetic dataset with a focus on complex forms of quantification, on FraCas and on one more natural dataset. We show that our technique leads to improvements on the more natural dataset, and achieves 100% accuracy on the synthetic dataset and on the relevant part of FraCas.
    ML ID: 311
  15. Weakly-Supervised Grammar-Informed Bayesian CCG Parser Learning
    [Details] [PDF] [Slides (PDF)]
    Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith
    In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15), Austin, TX, January 2015.
    Combinatory Categorial Grammar (CCG) is a lexicalized grammar formalism in which words are associated with categories that, in combination with a small universal set of rules, specify the syntactic configurations in which they may occur. Previous work has shown that learning sequence models for CCG tagging can be improved by using priors that are sensitive to the formal properties of CCG as well as cross-linguistic universals. We extend this approach to the task of learning a full CCG parser from weak supervision. We present a Bayesian formulation for CCG parser induction that assumes only supervision in the form of an incomplete tag dictionary mapping some word types to sets of potential categories. Our approach outperforms a baseline model trained with uniform priors by exploiting universal, intrinsic properties of the CCG formalism to bias the model toward simpler, more cross-linguistically common categories.
    ML ID: 310