Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: Lexical Semantics

Lexical semantics concerns the representation and use of word meanings in natural language processing. Our work in the area has focused on learning word meanings for use in semantic parsing and, more recently, improved distributional (vector space) models of word meaning. Lexical semantics is part of our research on natural language learning.
  1. Multi-Modal Word Synset Induction
    [Details] [PDF]
    Jesse Thomason and Raymond J. Mooney
    In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), 4116--4122, Melbourne, Australia, 2017.
    A word in natural language can be polysemous, having multiple meanings, as well as synonymous, meaning the same thing as other words. Word sense induction attempts to find the senses of polysemous words. Synonymy detection attempts to find when two words are interchangeable. We combine these tasks, first inducing word senses and then detecting similar senses to form word-sense synonym sets (synsets) in an unsupervised fashion. Given pairs of images and text with noun phrase labels, we perform synset induction to produce collections of underlying concepts described by one or more noun phrases. We find that considering multi-modal features from both visual and textual context yields better induced synsets than using either context alone. Human evaluations show that our unsupervised, multi-modally induced synsets are comparable in quality to annotation-assisted ImageNet synsets, achieving about 84% of ImageNet synsets' approval.
    ML ID: 344
  2. Continuously Improving Natural Language Understanding for Robotic Systems through Semantic Parsing, Dialog, and Multi-modal Perception
    [Details] [PDF]
    Jesse Thomason
    November 2016. PhD proposal, Department of Computer Science, The University of Texas at Austin.
    Robotic systems that interact with untrained human users must be able to understand and respond to natural language commands and questions. If a person requests ``take me to Alice's office'', the system and person must know that Alice is a person who owns some unique office. Similarly, if a person requests ``bring me the heavy, green mug'', the system and person must both know ``heavy'', ``green'', and ``mug'' are properties that describe an object in the environment, and have similar ideas about to what objects those properties apply. To facilitate deployment, methods to achieve these goals should require little initial in-domain data.

    We present completed work on understanding human language commands using sparse initial resources for semantic parsing. Clarification dialog with humans simultaneously resolves misunderstandings and generates more training data for better downstream parser performance. We introduce multi-modal grounding classifiers to give the robotic system perceptual contexts to understand object properties like ``green'' and ``heavy''. Additionally, we introduce and explore the task of word sense synonym set induction, which aims to discover polysemy and synonymy, which is helpful in the presence of sparse data and ambiguous properties such as ``light'' (light-colored versus lightweight).

    We propose to combine these orthogonal components into an integrated robotic system that understands human commands involving both static domain knowledge (such as who owns what office) and perceptual grounding (such as object retrieval). Additionally, we propose to strengthen the perceptual grounding component by performing word sense synonym set induction on object property words. We offer several long-term proposals to improve such an integrated system: exploring novel objects using only the context-necessary set of behaviors, a more natural learning paradigm for perception, and leveraging linguistic accommodation to improve parsing.

    ML ID: 338
  3. PIC a Different Word: A Simple Model for Lexical Substitution in Context
    [Details] [PDF]
    Stephen Roller and Katrin Erk
    In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-16), 1121-1126, San Diego, California, 2016.
    The Lexical Substitution task involves selecting and ranking lexical paraphrases for a target word in a given sentential context. We present PIC, a simple measure for estimating the appropriateness of substitutes in a given context. PIC outperforms another simple, comparable model proposed in recent work, especially when selecting substitutes from the entire vocabulary. Analysis shows that PIC improves over baselines by incorporating frequency biases into predictions.
    ML ID: 335
  4. MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification
    [Details] [PDF]
    Ye Zhang and Stephen Roller and Byron Wallace.
    In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-16), 1522--1527, San Diego, California, 2016.
    We introduce a novel, simple convolution neural network (CNN) architecture -- multi-group norm constraint CNN (MGNC-CNN) -- that capitalizes on multiple sets of word embeddings for sentence classification. MGNC-CNN extracts features from input embedding sets independently and then joins these at the penultimate layer in the network to form a final feature vector. We then adopt a group regularization strategy that differentially penalizes weights associated with the subcomponents generated from the respective embedding sets. This model is much simpler than comparable alternative architectures and requires substantially less training time. Furthermore, it is flexible in that it does not require input word embeddings to be of the same dimensionality. We show that MGNC-CNN consistently outperforms baseline models.
    ML ID: 334
  5. Representing Meaning with a Combination of Logical and Distributional Models
    [Details] [PDF]
    I. Beltagy and Stephen Roller and Pengxiang Cheng and Katrin Erk and Raymond J. Mooney
    The special issue of Computational Linguistics on Formal Distributional Semantics, 42(4), 2016.
    NLP tasks differ in the semantic information they require, and at this time no single semantic representation fulfills all requirements. Logic-based representations characterize sentence structure, but do not capture the graded aspect of meaning. Distributional models give graded similarity ratings for words and phrases, but do not capture sentence structure in the same detail as logic-based approaches. So it has been argued that the two are complementary. We adopt a hybrid approach that combines logical and distributional semantics using probabilistic logic, specifically Markov Logic Networks (MLNs). In this paper, we focus on the three components of a practical system: 1) Logical representation focuses on representing the input problems in probabilistic logic. 2) Knowledge base construction creates weighted inference rules by integrating distributional information with other sources. 3) Probabilistic inference involves solving the resulting MLN inference problems efficiently. To evaluate our approach, we use the task of textual entailment (RTE), which can utilize the strengths of both logic-based and distributional representations. In particular we focus on the SICK dataset, where we achieve state-of-the-art results. We also release a lexical entailment dataset of 10,213 rules extracted from the SICK dataset, which is a valuable resource for evaluating lexical entailment systems
    ML ID: 316
  6. Inclusive yet Selective: Supervised Distributional Hypernymy Detection
    [Details] [PDF]
    Stephen Roller and Katrin Erk and Gemma Boleda
    In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), 1025--1036, Dublin, Ireland, August 2014.
    We test the Distributional Inclusion Hypothesis, which states that hypernyms tend to occur in a superset of contexts in which their hyponyms are found. We find that this hypothesis only holds when it is applied to relevant dimensions. We propose a robust supervised approach that achieves accuracies of .84 and .85 on two existing datasets and that can be interpreted as selecting the dimensions that are relevant for distributional inclusion.
    ML ID: 306
  7. UTexas: Natural Language Semantics using Distributional Semantics and Probabilistic Logic
    [Details] [PDF]
    I. Beltagy and Stephen Roller and Gemma Boleda and and Katrin Erk and Raymond J. Mooney
    In The 8th Workshop on Semantic Evaluation (SemEval-2014), 796--801, Dublin, Ireland, August 2014.
    We represent natural language semantics by combining logical and distributional information in probabilistic logic. We use Markov Logic Networks (MLN) for the RTE task, and Probabilistic Soft Logic (PSL) for the STS task. The system is evaluated on the SICK dataset. Our best system achieves 73% accuracy on the RTE task, and a Pearson's correlation of 0.71 on the STS task.
    ML ID: 305
  8. A Multimodal LDA Model Integrating Textual, Cognitive and Visual Modalities
    [Details] [PDF]
    Stephen Roller and Sabine Schulte im Walde
    In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 1146--1157, Seattle, WA, October 2013.
    Recent investigations into grounded models of language have shown that holistic views of language and perception can provide higher performance than independent views. In this work, we improve a two-dimensional multimodal version of Latent Dirichlet Allocation (Andrews et al., 2009) in various ways. (1) We outperform text-only models in two different evaluations, and demonstrate that low-level visual features are directly compatible with the existing model. (2) We present a novel way to integrate visual features into the LDA model using unsupervised clusters of images. The clusters are directly interpretable and improve on our evaluation tasks. (3) We provide two novel ways to extend the bimodal models to support three or more modalities. We find that the three-, four-, and five-dimensional models significantly outperform models using only one or two modalities, and that nontextual modalities each provide separate, disjoint knowledge that cannot be forced into a shared, latent structure.
    ML ID: 294
  9. Identifying Phrasal Verbs Using Many Bilingual Corpora
    [Details] [PDF] [Poster]
    Karl Pichotta and John DeNero
    In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 636--646, Seattle, WA, October 2013.
    We address the problem of identifying multiword expressions in a language, focusing on English phrasal verbs. Our polyglot ranking approach integrates frequency statistics from translated corpora in 50 different languages. Our experimental evaluation demonstrates that combining statistical evidence from many parallel corpora using a novel ranking-oriented boosting algorithm produces a comprehensive set of English phrasal verbs, achieving performance comparable to a human-curated set.
    ML ID: 293
  10. Latent Variable Models of Distributional Lexical Semantics
    [Details] [PDF]
    Joseph Reisinger
    PhD Thesis, Department of Computer Science, University of Texas at Austin, May 2012.
    In order to respond to increasing demand for natural language interfaces—and provide meaningful insight into user query intent—fast, scalable lexical semantic models with flexible representations are needed. Human concept organization is a rich phenomenon that has yet to be accounted for by a single coherent psychological framework: Concept generalization is captured by a mixture of prototype and exemplar models, and local taxonomic information is available through multiple overlapping organizational systems. Previous work in computational linguistics on extracting lexical semantic information from unannotated corpora does not provide adequate representational flexibility and hence fails to capture the full extent of human conceptual knowledge. In this thesis I outline a family of probabilistic models capable of capturing important aspects of the rich organizational structure found in human language that can predict contextual variation, selectional preference and feature-saliency norms to a much higher degree of accuracy than previous approaches. These models account for cross-cutting structure of concept organization—i.e. selective attention, or the notion that humans make use of different categorization systems for different kinds of generalization tasks—and can be applied to Web-scale corpora. Using these models, natural language systems will be able to infer a more comprehensive semantic relations, which in turn may yield improved systems for question answering, text classification, machine translation, and information retrieval.
    ML ID: 309
  11. Cross-Cutting Models of Lexical Semantics
    [Details] [PDF] [Slides (PDF)]
    Joseph Reisinger and Raymond Mooney
    In Proceedings of The Conference on Empirical Methods in Natural Language Processing (EMNLP 2011), 1405-1415, July 2011.
    Context-dependent word similarity can be measured over multiple cross-cutting dimensions. For example, lung and breath are similar thematically, while authoritative and superficial occur in similar syntactic contexts, but share little semantic similarity. Both of these notions of similarity play a role in determining word meaning, and hence lexical semantic models must take them both into account. Towards this end, we develop a novel model, Multi-View Mixture (MVM), that represents words as multiple overlapping clusterings. MVM finds multiple data partitions based on different subsets of features, subject to the marginal constraint that feature subsets are distributed according to Latent Dirichlet Allocation. Intuitively, this constraint favors feature partitions that have coherent topical semantics. Furthermore, MVM uses soft feature assignment, hence the contribution of each data point to each clustering view is variable, isolating the impact of data only to views where they assign the most features. Through a series of experiments, we demonstrate the utility of MVM as an inductive bias for capturing relations between words that are intuitive to humans, outperforming related models such as Latent Dirichlet Allocation.
    ML ID: 262
  12. A Mixture Model with Sharing for Lexical Semantics
    [Details] [PDF] [Slides (PDF)]
    Joseph Reisinger and Raymond J. Mooney
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), 1173--1182, MIT, Massachusetts, USA, October 9--11 2010.
    We introduce tiered clustering, a mixture model capable of accounting for varying degrees of shared (context-independent) feature structure, and demonstrate its applicability to inferring distributed representations of word meaning. Common tasks in lexical semantics such as word relatedness or selectional preference can benefit from modeling such structure: Polysemous word usage is often governed by some common background metaphoric usage (e.g. the senses of line or run), and likewise modeling the selectional preference of verbs relies on identifying commonalities shared by their typical arguments. Tiered clustering can also be viewed as a form of soft feature selection, where features that do not contribute meaningfully to the clustering can be excluded. We demonstrate the applicability of tiered clustering, highlighting particular cases where modeling shared structure is beneficial and where it can be detrimental.
    ML ID: 252
  13. Cross-cutting Models of Distributional Lexical Semantics
    [Details] [PDF] [Slides (PDF)]
    Joseph S. Reisinger
    June 2010. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
    In order to respond to increasing demand for natural language interfaces—and provide meaningful insight into user query intent—fast, scalable lexical semantic models with flexible representations are needed. Human concept organization is a rich epiphenomenon that has yet to be accounted for by a single coherent psychological framework: Concept generalization is captured by a mixture of prototype and exemplar models, and local taxonomic information is available through multiple overlapping organizational systems. Previous work in computational linguistics on extracting lexical semantic information from the Web does not provide adequate representational flexibility and hence fails to capture the full extent of human conceptual knowledge. In this proposal I will outline a family of probabilistic models capable of accounting for the rich organizational structure found in human language that can predict contextual variation, selectional preference and feature-saliency norms to a much higher degree of accuracy than previous approaches. These models account for cross-cutting structure of concept organization—i.e. the notion that humans make use of different categorization systems for different kinds of generalization tasks—and can be applied to Web-scale corpora. Using these models, natural language systems will be able to infer a more comprehensive semantic relations, in turn improving question answering, text classification, machine translation, and information retrieval.
    ML ID: 249
  14. Multi-Prototype Vector-Space Models of Word Meaning
    [Details] [PDF] [Slides (PDF)]
    Joseph Reisinger, Raymond J. Mooney
    In Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-2010), 109-117, 2010.
    Current vector-space models of lexical semantics create a single “prototype” vector to represent the meaning of a word. However, due to lexical ambiguity, encoding word meaning with a single vector is problematic. This paper presents a method that uses clustering to produce multiple “sense-specific&rdquo vectors for each word. This approach provides a context-dependent vector representation of word meaning that naturally accommodates homonymy and polysemy. Experimental comparisons to human judgements of semantic similarity for both isolated words as well as words in sentential contexts demonstrate the superiority of this approach over both prototype and exemplar based vector-space models.
    ML ID: 241
  15. Acquiring Word-Meaning Mappings for Natural Language Interfaces
    [Details] [PDF]
    Cynthia A. Thompson and Raymond J. Mooney
    Journal of Artificial Intelligence Research, 18:1-44, 2003.
    This paper focuses on a system, Wolfie (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of phrases paired with meaning representations. Wolfie is part of an integrated system that learns to parse representations such as logical database queries.
    Experimental results are presented demonstrating Wolfie's ability to learn useful lexicons for a database interface in four different natural languages. The usefulness of the lexicons learned by Wolfie are compared to those acquired by a similar system developed by Siskind (1996), with results favorable to Wolfie. A second set of experiments demonstrates Wolfie's ability to scale to larger and more difficult, albeit artificially generated, corpora.
    In natural language acquisition, it is difficult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods (Cohn, Atlas, & Ladner, 1994) attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, most results to date for active learning have only considered standard classification tasks. To reduce annotation effort while maintaining accuracy, we apply active learning to semantic lexicons. We show that active learning can significantly reduce the number of annotated examples required to achieve a given level of performance.
    ML ID: 121
  16. Automatic Construction of Semantic Lexicons for Learning Natural Language Interfaces
    [Details] [PDF]
    Cynthia A. Thompson and Raymond J. Mooney
    In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), 487-493, Orlando, FL, July 1999.
    This paper describes a system, Wolfie (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with semantic representations. The lexicon learned consists of words paired with meaning representations. Wolfie is part of an integrated system that learns to parse novel sentences into semantic representations, such as logical database queries. Experimental results are presented demonstrating Wolfie's ability to learn useful lexicons for a database interface in four different natural languages. The lexicons learned by Wolfie are compared to those acquired by a competing system developed by Siskind.
    ML ID: 95
  17. Semantic Lexicon Acquisition for Learning Natural Language Interfaces
    [Details] [PDF]
    Cynthia Ann Thompson
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, December 1998. 101 pages. Also appears as Technical Report AI 99-278, Artificial Intelligence Lab, University of Texas at Austin.
    A long-standing goal for the field of artificial intelligence is to enable computer understanding of human languages. A core requirement in reaching this goal is the ability to transform individual sentences into a form better suited for computer manipulation. This ability, called semantic parsing, requires several knowledge sources, such as a grammar, lexicon, and parsing mechanism.
    Building natural language parsing systems by hand is a tedious, error-prone undertaking. We build on previous research in automating the construction of such systems using machine learning techniques. The result is a combined system that learns semantic lexicons and semantic parsers from one common set of training examples. The input required is a corpus of sentence/representation pairs, where the representations are in the output format desired. A new system, Wolfie, learns semantic lexicons to be used as background knowledge by a previously developed parser acquisition system, Chill. The combined system is tested on a real world domain of answering database queries. We also compare this combination to a combination of Chill with a previously developed lexicon learner, demonstrating superior performance with our system. In addition, we show the ability of the system to learn to process natural languages other than English. Finally, we test the system on an alternate sentence representation, and on a set of large, artificial corpora with varying levels of ambiguity and synonymy.
    One difficulty in using machine learning methods for building natural language interfaces is building the required annotated corpus. Therefore, we also address this issue by using active learning to reduce the number of training examples required by both Wolfie and Chill. Experimental results show that the number of examples needed to reach a given level of performance can be significantly reduced with this method.
    ML ID: 90
  18. Semantic Lexicon Acquisition for Learning Natural Language Interfaces
    [Details] [PDF]
    Cynthia A. Thompson and Raymond J. Mooney
    In Proceedings of the Sixth Workshop on Very Large Corpora, Montreal, Quebec, Canada, August 1998. Also available as TR AI 98-273, Artificial Intelligence Lab, University of Texas at Austin, May 1998.
    This paper describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that acquires a semantic lexicon from a corpus of sentences paired with representations of their meaning. The lexicon learned consists of words paired with meaning representations. WOLFIE is part of an integrated system that learns to parse novel sentences into semantic representations, such as logical database queries. Experimental results are presented demonstrating WOLFIE's ability to learn useful lexicons for a database interface in four different natural languages. The lexicons learned by WOLFIE are compared to those acquired by a competing system developed by Siskind (1996).
    ML ID: 89
  19. Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-96), 82-91, Philadelphia, PA, 1996.
    This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neural-network, decision-tree, rule-based, and case-based classification techniques. The specific problem tested involves disambiguating six senses of the word ``line'' using the words in the current and proceeding sentence as context. The statistical and neural-network methods perform the best on this particular problem and we discuss a potential reason for this observed difference. We also discuss the role of bias in machine learning and its importance in explaining performance differences observed on specific problems.
    ML ID: 62
  20. Corpus-Based Lexical Acquisition For Semantic Parsing
    [Details] [PDF]
    Cynthia Thompson
    February 1996. Ph.D. proposal.
    Building accurate and efficient natural language processing (NLP) systems is an important and difficult problem. There has been increasing interest in automating this process. The lexicon, or the mapping from words to meanings, is one component that is typically difficult to update and that changes from one domain to the next. Therefore, automating the acquisition of the lexicon is an important task in automating the acquisition of NLP systems. This proposal describes a system, WOLFIE (WOrd Learning From Interpreted Examples), that learns a lexicon from input consisting of sentences paired with representations of their meanings. Preliminary experimental results show that this system can learn correct and useful mappings. The correctness is evaluated by comparing a known lexicon to one learned from the training input. The usefulness is evaluated by examining the effect of using the lexicon learned by WOLFIE to assist a parser acquisition system, where previously this lexicon had to be hand-built. Future work in the form of extensions to the algorithm, further evaluation, and possible applications is discussed.
    ML ID: 57
  21. Lexical Acquisition: A Novel Machine Learning Problem
    [Details] [PDF]
    Cynthia A. Thompson and Raymond J. Mooney
    Technical Report, Artificial Intelligence Lab, University of Texas at Austin, January 1996.
    This paper defines a new machine learning problem to which standard machine learning algorithms cannot easily be applied. The problem occurs in the domain of lexical acquisition. The ambiguous and synonymous nature of words causes the difficulty of using standard induction techniques to learn a lexicon. Additionally, negative examples are typically unavailable or difficult to construct in this domain. One approach to solve the lexical acquisition problem is presented, along with preliminary experimental results on an artificial corpus. Future work includes extending the algorithm and performing tests on a more realistic corpus.
    ML ID: 56
  22. Acquisition of a Lexicon from Semantic Representations of Sentences
    [Details] [PDF]
    Cynthia A. Thompson
    In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL-95), 335-337, Cambridge, MA, 1995.
    A system, WOLFIE, that acquires a mapping of words to their semantic representation is presented and a preliminary evaluation is performed. Tree least general generalizations (TLGGs) of the representations of input sentences are performed to assist in determining the representations of individual words in the sentences. The best guess for a meaning of a word is the TLGG which overlaps with the highest percentage of sentence representations in which that word appears. Some promising experimental results on a non-artificial data set are presented.
    ML ID: 45
  23. Integrated Learning of Words and their Underlying Concepts
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the Ninth Annual Conference of the Cognitive Science Society, 947-978, Seattle, WA, July 1987.
    Models of learning word meanings have generally assumed prior knowledge of the concepts to which the words refer. However, novel natural language text or discourse often presents both unknown concepts and words which refer to these concepts. Also, developmental data suggests that the learning of words and their concepts frequently occurs concurrently instead of concept learning proceeding word learning. This paper presents an integrated computational model for acquiring both word meanings and their underlying concepts concurrently. This model is implemented as a word learning component added to the GENESIS explanation-based learning schema acquisition system for narrative understanding. A detailed example is described in which GENESIS learns provisional definitions for the words "kidnap", "kidnapper", and "ransom" as well as a kidnapping schema from a single narrative.
    ML ID: 208