Machine Learning Research Group | University of Texas

Publications: 2016

Hide abstracts

Stacking With Auxiliary Features for Combining Supervised and Unsupervised Ensembles
[Details] [PDF]
Nazneen Fatema Rajani and Raymond J. Mooney
In Proceedings of the Ninth Text Analysis Conference (TAC 2016), 2016.
We propose stacking with auxiliary features(SWAF) that combines supervised and unsupervised methods to ensemble multiple sys-tems for the Tri-lingual Entity Discovery andLinking (TEDL) 2016 evaluation. We use theTEDL 2015 systems for training and EDL12016 systems for evaluating our algorithm.We perform a post-processing step on the out-puts obtained from the classifier so as to ag-gregate into one final system.
ML ID: 356
Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision
[Details] [PDF] [Slides (PDF)]
Nazneen Fatema Rajani
November 2016. PhD proposal, Department of Computer Science, The University of Texas at Austin.
Ensembling methods are well known in machine learning for improving prediction accuracy. However, they are limited in the sense that they cannot effectively discriminate among underlying component models. Some models perform better at certain types of input instances than other models. The measure of how good a model is can sometimes be gauged from "where" it extracted the output and "why" it made the prediction. This information can be exploited to leverage the component models in an ensemble. In this proposal, we present stacking with auxiliary features that integrates relevant information from multiple sources to improve ensembling. We use two types of auxiliary features - instance features and provenance features. The instance features enable the stacker to discriminate across input instances while the provenance features enable the stacker to discriminate across component systems. When combined together, our algorithm learns to rely on systems that not just agree on an output but also the provenance of this output in conjunction with the input instance type.
We demonstrate our approach on three very different and difficult problems: Cold Start Slot Filling, Tri-lingual Entity Discovery and Linking, and ImageNet Object Detection. The first two problems are well known tasks in Natural Language Processing, and the third one is in the domain of Computer Vision. Our algorithm obtains state-of-the-art results on the first two tasks and significant improvements on the ImageNet task, thus verifying the power and generality of our approach. We also present a novel approach using stacking for combining systems that do not have training data in an unsupervised ensemble with systems that do have training data. Our combined approach achieves state-of-the-art on the Cold Start Slot Filling and Tri-lingual Entity Discovery and Linking tasks, beating our own prior performance on ensembling just the supervised systems.
We propose several short-term and long-term extensions to our work. In the short-term, we focus our work on using more semantic instance-level features for all the three tasks, and use non-lexical features that are language independent for the two NLP tasks. In the long-term we propose to demonstrate our ensembling algorithm on the Visual Question Answering task and use textual/visual explanations as auxiliary features to stacking.
ML ID: 340
An Analysis of Using Semantic Parsing for Speech Recognition
[Details] [PDF] [Slides (PPT)]
Rodolfo Corona
2016. Undergraduate Honors Thesis, Computer Science Department, University of Texas at Austin.
This thesis explores the use of semantic parsing for improving speech recognition performance. Specifically, it explores how a semantic parser may be used in order to re-rank the n-best hypothesis list generated by an automatic speech recognition system. We also explore how system performance is affected when retraining the system's acoustic model using a portion of the re-ranked data.
ML ID: 339
Continuously Improving Natural Language Understanding for Robotic Systems through Semantic Parsing, Dialog, and Multi-modal Perception
[Details] [PDF]
Jesse Thomason
November 2016. PhD proposal, Department of Computer Science, The University of Texas at Austin.
Robotic systems that interact with untrained human users must be able to understand and respond to natural language commands and questions. If a person requests ``take me to Alice's office'', the system and person must know that Alice is a person who owns some unique office. Similarly, if a person requests ``bring me the heavy, green mug'', the system and person must both know ``heavy'', ``green'', and ``mug'' are properties that describe an object in the environment, and have similar ideas about to what objects those properties apply. To facilitate deployment, methods to achieve these goals should require little initial in-domain data.
We present completed work on understanding human language commands using sparse initial resources for semantic parsing. Clarification dialog with humans simultaneously resolves misunderstandings and generates more training data for better downstream parser performance. We introduce multi-modal grounding classifiers to give the robotic system perceptual contexts to understand object properties like ``green'' and ``heavy''. Additionally, we introduce and explore the task of word sense synonym set induction, which aims to discover polysemy and synonymy, which is helpful in the presence of sparse data and ambiguous properties such as ``light'' (light-colored versus lightweight).
We propose to combine these orthogonal components into an integrated robotic system that understands human commands involving both static domain knowledge (such as who owns what office) and perceptual grounding (such as object retrieval). Additionally, we propose to strengthen the perceptual grounding component by performing word sense synonym set induction on object property words. We offer several long-term proposals to improve such an integrated system: exploring novel objects using only the context-necessary set of behaviors, a more natural learning paradigm for perception, and leveraging linguistic accommodation to improve parsing.
ML ID: 338
Natural Language Semantics Using Probabilistic Logic
[Details] [PDF] [Slides (PPT)] [Slides (PDF)]
I. Beltagy
PhD Thesis, Department of Computer Science, The University of Texas at Austin, December 2016.
With better natural language semantic representations, computers can do more applications more efficiently as a result of better understanding of natural text. However, no single semantic representation at this time fulfills all requirements needed for a satisfactory representation. Logic-based representations like first-order logic capture many of the linguistic phenomena using logical constructs, and they come with standardized inference mechanisms, but standard first-order logic fails to capture the "graded" aspect of meaning in languages. Other approaches for semantics, like distributional models, focus on capturing "graded" semantic similarity of words and phrases but do not capture sentence structure in the same detail as logic-based approaches. However, both aspects of semantics, structure and gradedness, are important for an accurate language semantics representation.
In this work, we propose a natural language semantics representation that uses probabilistic logic (PL) to integrate logical with weighted uncertain knowledge. It combines the expressivity and the automated inference of logic with the ability to reason with uncertainty. To demonstrate the effectiveness of our semantic representation, we implement and evaluate it on three tasks, recognizing textual entailment (RTE), semantic textual similarity (STS) and open-domain question answering (QA). These tasks can utilize the strengths of our representation and the integration of logical representation and uncertain knowledge. Our semantic representation has three components, Logical Form, Knowledge Base and Inference, all of which present interesting challenges and we make new contributions in each of them.
The first component is the Logical Form, which is the primary meaning representation. We address two points, how to translate input sentences to logical form, and how to adapt the resulting logical form to PL. First, we use Boxer, a CCG-based semantic analysis tool to translate sentences to logical form. We also explore translating dependency trees to logical form. Then, we adapt the logical forms to ensure that universal quantifiers and negations work as expected.
The second component is the Knowledge Base which contains "uncertain" background knowledge required for a given problem. We collect the "relevant" lexical information from different linguistic resources, encode them as weighted logical rules, and add them to the knowledge base. We add rules from existing databases, in particular WordNet and the Paraphrase Database (PPDB). Since these are incomplete, we generate additional on-the-fly rules that could be useful. We use alignment techniques to propose rules that are relevant to a particular problem, and explore two alignment methods, one based on Robinson's resolution and the other based on graph matching. We automatically annotate the proposed rules and use them to learn weights for unseen rules.
The third component is Inference. This component is implemented for each task separately. We use the logical form and the knowledge base constructed in the previous two steps to formulate the task as a PL inference problem then develop a PL inference algorithm that is optimized for this particular task. We explore the use of two PL frameworks, Markov Logic Networks (MLNs) and Probabilistic Soft Logic (PSL). We discuss which framework works best for a particular task, and present new inference algorithms for each framework.
ML ID: 337
Statistical Script Learning with Recurrent Neural Networks
[Details] [PDF] [Poster]
Karl Pichotta and Raymond J. Mooney
In Proceedings of the Workshop on Uphill Battles in Language Processing (UBLP) at EMNLP 2016, Austin, TX, November 2016.
We describe some of our recent efforts in learning statistical models of co-occurring events from large text corpora using Recurrent Neural Networks.
ML ID: 336
PIC a Different Word: A Simple Model for Lexical Substitution in Context
[Details] [PDF]
Stephen Roller and Katrin Erk
In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-16), 1121-1126, San Diego, California, 2016.
The Lexical Substitution task involves selecting and ranking lexical paraphrases for a target word in a given sentential context. We present PIC, a simple measure for estimating the appropriateness of substitutes in a given context. PIC outperforms another simple, comparable model proposed in recent work, especially when selecting substitutes from the entire vocabulary. Analysis shows that PIC improves over baselines by incorporating frequency biases into predictions.
ML ID: 335
MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification
[Details] [PDF]
Ye Zhang and Stephen Roller and Byron Wallace.
In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-16), 1522--1527, San Diego, California, 2016.
We introduce a novel, simple convolution neural network (CNN) architecture -- multi-group norm constraint CNN (MGNC-CNN) -- that capitalizes on multiple sets of word embeddings for sentence classification. MGNC-CNN extracts features from input embedding sets independently and then joins these at the penultimate layer in the network to form a final feature vector. We then adopt a group regularization strategy that differentially penalizes weights associated with the subcomponents generated from the respective embedding sets. This model is much simpler than comparable alternative architectures and requires substantially less training time. Furthermore, it is flexible in that it does not require input word embeddings to be of the same dimensionality. We show that MGNC-CNN consistently outperforms baseline models.
ML ID: 334
Stacking With Auxiliary Features
[Details] [PDF]
Nazneen Fatema Rajani and Raymond J. Mooney
ArXiv preprint arXiv:1605.08764, 2016.
Ensembling methods are well known for improving prediction accuracy. However, they are limited in the sense that they cannot discriminate among component models effectively. In this paper, we propose stacking with auxiliary features that learns to fuse relevant information from multiple systems to improve performance. Auxiliary features enable the stacker to rely on systems that not just agree on an output but also the provenance of the output. We demonstrate our approach on three very different and difficult problems -- the Cold Start Slot Filling, the Tri-lingual Entity Discovery and Linking and the ImageNet object detection tasks. We obtain new state-of-the-art results on the first two tasks and substantial improvements on the detection task, thus verifying the power and generality of our approach.
ML ID: 333
Improved Semantic Parsers For If-Then Statements
[Details] [PDF]
I. Beltagy and Chris Quirk
To Appear In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-16), Berlin, Germany, 2016.
Digital personal assistants are becoming both more common and more useful. The major NLP challenge for personal assistants is machine understanding: translating natural language user commands into an executable representation. This paper focuses on understanding rules written as If-Then statements, though the techniques should be portable to other semantic parsing tasks. We view understanding as structure prediction and show improved models using both conventional techniques and neural network models. We also discuss various ways to improve generalization and reduce overfitting: synthetic training data from paraphrase, grammar combinations, feature selection and ensembles of multiple systems. An ensemble of these techniques achieves a new state of the art result with 8% accuracy improvement.
ML ID: 332
Combining Supervised and Unsupervised Ensembles for Knowledge Base Population
[Details] [PDF]
Nazneen Fatema Rajani and Raymond J. Mooney
To Appear In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), 2016.
We propose an algorithm that combines supervised and unsupervised methods to ensemble multiple systems for two popular Knowledge Base Population (KBP) tasks, Cold Start Slot Filling (CSSF) and Tri-lingual Entity Discovery and Linking (TEDL). We demonstrate that it outperforms the best system for both tasks in the 2015 competition, several ensembling baselines, as well as a state-of-the-art stacking approach. The success of our technique on two different and challenging problems demonstrates the power and generality of our combined approach to ensembling.
ML ID: 331
Using Sentence-Level LSTM Language Models for Script Inference
[Details] [PDF] [Slides (PPT)] [Slides (PDF)]
Karl Pichotta and Raymond J. Mooney
In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL-16), 279--289, Berlin, Germany, 2016.
There is a small but growing body of research on statistical scripts, models of event sequences that allow probabilistic inference of implicit events from documents. These systems operate on structured verb-argument events produced by an NLP pipeline. We compare these systems with recent Recurrent Neural Net models that directly operate on raw tokens to predict sentences, finding the latter to be roughly comparable to the former in terms of predicting missing events in documents.
ML ID: 330
Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy"
[Details] [PDF]
Jesse Thomason and Jivko Sinapov and Maxwell Svetlik and Peter Stone and Raymond J. Mooney
In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI-16), 3477--3483, New York City, 2016.
Grounded language learning bridges words like 'red' and 'square' with robot perception. The vast majority of existing work in this space limits robot perception to vision. In this paper, we build perceptual models that use haptic, auditory, and proprioceptive data acquired through robot exploratory behaviors to go beyond vision. Our system learns to ground natural language words describing objects using supervision from an interactive human-robot "I Spy" game. In this game, the human and robot take turns describing one object among several, then trying to guess which object the other has described. All supervision labels were gathered from human participants physically present to play this game with a robot. We demonstrate that our multi-modal system for grounding natural language outperforms a traditional, vision-only grounding framework by comparing the two on the "I Spy" task. We also provide a qualitative analysis of the groundings learned in the game, visualizing what words are understood better with multi-modal sensory information as well as identifying learned word meanings that correlate with physical object properties (e.g. 'small' negatively correlates with object weight).
ML ID: 329
Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text
[Details] [PDF] [Poster]
Subhashini Venugopalan and Lisa Anne Hendricks and Raymond Mooney and Kate Saenko
In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), 1961--1966, Austin, Texas, 2016.
This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos. Specifically, we integrate both a neural language model and distributional semantics trained on large text corpora into a recent LSTM-based architecture for video description. We evaluate our approach on a collection of Youtube videos as well as two large movie description datasets showing significant improvements in grammaticality while modestly improving descriptive quality.
ML ID: 328
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
[Details] [PDF]
Lisa Anne Hendricks and Subhashini Venugopalan and Marcus Rohrbach and Raymond Mooney and Kate Saenko and Trevor Darrell
In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR-16), 1--10, 2016.
While recent deep neural network models have achieved promising results on the image captioning task, they rely largely on the availability of corpora with paired image and sentence captions to describe objects in context. In this work, we propose the Deep Compositional Captioner (DCC) to address the task of generating descriptions of novel objects which are not present in paired image-sentence datasets. Our method achieves this by leveraging large object recognition datasets and external text corpora and by transferring knowledge between semantically similar concepts. Current deep caption models can only describe objects contained in paired image-sentence corpora, despite the fact that they are pre-trained with large object recognition datasets, namely ImageNet. In contrast, our model can compose sentences that describe novel objects and their interactions with other objects. We demonstrate our model’s ability to describe novel concepts by empirically evaluating its performance on MSCOCO and show qualitative results on ImageNet images of objects for which no paired image-sentence data exist. Further, we extend our approach to generate descriptions of objects in video clips. Our results show that DCC has distinct advantages over existing image and video captioning approaches for generating descriptions of new objects in context.
ML ID: 327
Learning Statistical Scripts with LSTM Recurrent Neural Networks
[Details] [PDF] [Slides (PPT)] [Slides (PDF)]
Karl Pichotta and Raymond J. Mooney
In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI-16), Phoenix, Arizona, February 2016.
Scripts encode knowledge of prototypical sequences of events. We describe a Recurrent Neural Network model for statistical script learning using Long Short-Term Memory, an architecture which has been demonstrated to work well on a range of Artificial Intelligence tasks. We evaluate our system on two tasks, inferring held-out events from text and inferring novel events from text, substantially outperforming prior approaches on both tasks.
ML ID: 325
Representing Meaning with a Combination of Logical and Distributional Models
[Details] [PDF]
I. Beltagy and Stephen Roller and Pengxiang Cheng and Katrin Erk and Raymond J. Mooney
The special issue of Computational Linguistics on Formal Distributional Semantics, 42(4), 2016.
NLP tasks differ in the semantic information they require, and at this time no single semantic representation fulfills all requirements. Logic-based representations characterize sentence structure, but do not capture the graded aspect of meaning. Distributional models give graded similarity ratings for words and phrases, but do not capture sentence structure in the same detail as logic-based approaches. So it has been argued that the two are complementary. We adopt a hybrid approach that combines logical and distributional semantics using probabilistic logic, specifically Markov Logic Networks (MLNs). In this paper, we focus on the three components of a practical system: 1) Logical representation focuses on representing the input problems in probabilistic logic. 2) Knowledge base construction creates weighted inference rules by integrating distributional information with other sources. 3) Probabilistic inference involves solving the resulting MLN inference problems efficiently. To evaluate our approach, we use the task of textual entailment (RTE), which can utilize the strengths of both logic-based and distributional representations. In particular we focus on the SICK dataset, where we achieve state-of-the-art results. We also release a lexical entailment dataset of 10,213 rules extracted from the SICK dataset, which is a valuable resource for evaluating lexical entailment systems
ML ID: 316