Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: Unsupervised Learning, Clustering, and Self-Organization

Unsupervised learning does not require annotation or labeling from a human teacher; the idea is to learn the structure of the data from unlabeled examples. The most common unsupervised learning task is clustering, i.e. grouping instances into a discovered set of categories containing similar instances. Self-organizing maps in addition visualize the topology of the clusters on a map. Our work in this area includes applications on lexical semantics, topic modeling, and discovering latent class models, as well as methods for laterally connected, hierarchical, sequential-input, and growing self-organizing maps.
  1. Continually Improving Grounded Natural Language Understanding through Human-Robot Dialog
    [Details] [PDF] [Slides (PPT)] [Slides (PDF)]
    Jesse Thomason
    PhD Thesis, Department of Computer Science, The University of Texas at Austin, April 2018.
    As robots become ubiquitous in homes and workplaces such as hospitals and factories, they must be able to communicate with humans. Several kinds of knowledge are required to understand and respond to a human's natural language commands and questions. If a person requests an assistant robot to "take me to Alice's office", the robot must know that Alice is a person who owns some unique office, and that "take me" means it should navigate there. Similarly, if a person requests "bring me the heavy, green mug", the robot must have accurate mental models of the physical concepts "heavy", "green", and "mug". To avoid forcing humans to use key phrases or words robots already know, this thesis focuses on helping robots understanding new language constructs through interactions with humans and with the world around them. To understand a command in natural language, a robot must first convert that command to an internal representation that it can reason with. Semantic parsing is a method for performing this conversion, and the target representation is often semantic forms represented as predicate logic with lambda calculus. Traditional semantic parsing relies on hand-crafted resources from a human expert: an ontology of concepts, a lexicon connecting language to those concepts, and training examples of language with abstract meanings. One thrust of this thesis is to perform semantic parsing with sparse initial data. We use the conversations between a robot and human users to induce pairs of natural language utterances with the target semantic forms a robot discovers through its questions, reducing the annotation effort of creating training examples for parsing. We use this data to build more dialog-capable robots in new domains with much less expert human effort. Meanings of many language concepts are bound to the physical world. Understanding object properties and categories, such as "heavy", "green", and "mug" requires interacting with and perceiving the physical world. Embodied robots can use manipulation capabilities, such as pushing, picking up, and dropping objects to gather sensory data about them. This data can be used to understand non-visual concepts like "heavy" and "empty" (e.g. "get the empty carton of milk from the fridge"), and assist with concepts that have both visual and non-visual expression (e.g. "tall" things look big and also exert force sooner than "short" things when pressed down on). A second thrust of this thesis focuses on strategies for learning these concepts using multi-modal sensory information. We use human-in-the-loop learning to get labels between concept words and actual objects in the environment. We also explore ways to tease out polysemy and synonymy in concept words such as "light", which can refer to a weight or a color, the latter sense being synonymous with "pale". Additionally, pushing, picking up, and dropping objects to gather sensory information is prohibitively time-consuming, so we investigate strategies for using linguistic information and human input to expedite exploration when learning a new concept. Finally, we build an integrated agent with both parsing and perception capabilities that learns from conversations with users to improve both components over time. We demonstrate that parser learning from conversations can be combined with multi-modal perception using predicate-object labels gathered through opportunistic active learning during those conversations to improve performance for understanding natural language commands from humans. Human users also qualitatively rate this integrated learning agent as more usable after it has improved from conversation-based learning.
    ML ID: 361
  2. Unsupervised Code-Switching for Multilingual Historical Document Transcription
    [Details] [PDF] [Slides (PDF)]
    Dan Garrette and Hannah Alpert-Abrams and Taylor Berg-Kirkpatrick and Dan Klein
    In Proceedings the 2015 Conference of the North American Chapter of the Association for Computational Linguistics -- Human Language Technologies (NAACL HLT 2015), 1036--1041, Denver, Colorado, June 2015.
    Transcribing documents from the printing press era, a challenge in its own right, is more complicated when documents interleave multiple languages—a common feature of 16th century texts. Additionally, many of these documents precede consistent orthographic conventions, making the task even harder. We extend the state-of-the-art historical OCR model of Berg-Kirkpatrick et al. (2013) to handle word-level code-switching between multiple languages. Further, we enable our system to handle spelling variability, including now-obsolete shorthand systems used by printers. Our results show average relative character error reductions of 14% across a variety of historical texts.
    ML ID: 312
  3. A Mixture Model with Sharing for Lexical Semantics
    [Details] [PDF] [Slides (PDF)]
    Joseph Reisinger and Raymond J. Mooney
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), 1173--1182, MIT, Massachusetts, USA, October 9--11 2010.
    We introduce tiered clustering, a mixture model capable of accounting for varying degrees of shared (context-independent) feature structure, and demonstrate its applicability to inferring distributed representations of word meaning. Common tasks in lexical semantics such as word relatedness or selectional preference can benefit from modeling such structure: Polysemous word usage is often governed by some common background metaphoric usage (e.g. the senses of line or run), and likewise modeling the selectional preference of verbs relies on identifying commonalities shared by their typical arguments. Tiered clustering can also be viewed as a form of soft feature selection, where features that do not contribute meaningfully to the clustering can be excluded. We demonstrate the applicability of tiered clustering, highlighting particular cases where modeling shared structure is beneficial and where it can be detrimental.
    ML ID: 252
  4. Cross-cutting Models of Distributional Lexical Semantics
    [Details] [PDF] [Slides (PDF)]
    Joseph S. Reisinger
    June 2010. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
    In order to respond to increasing demand for natural language interfaces—and provide meaningful insight into user query intent—fast, scalable lexical semantic models with flexible representations are needed. Human concept organization is a rich epiphenomenon that has yet to be accounted for by a single coherent psychological framework: Concept generalization is captured by a mixture of prototype and exemplar models, and local taxonomic information is available through multiple overlapping organizational systems. Previous work in computational linguistics on extracting lexical semantic information from the Web does not provide adequate representational flexibility and hence fails to capture the full extent of human conceptual knowledge. In this proposal I will outline a family of probabilistic models capable of accounting for the rich organizational structure found in human language that can predict contextual variation, selectional preference and feature-saliency norms to a much higher degree of accuracy than previous approaches. These models account for cross-cutting structure of concept organization—i.e. the notion that humans make use of different categorization systems for different kinds of generalization tasks—and can be applied to Web-scale corpora. Using these models, natural language systems will be able to infer a more comprehensive semantic relations, in turn improving question answering, text classification, machine translation, and information retrieval.
    ML ID: 249
  5. Spherical Topic Models
    [Details] [PDF] [Slides (PDF)]
    Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J. Mooney
    In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), 2010.
    We introduce the Spherical Admixture Model (SAM), a Bayesian topic model for arbitrary L2 normalized data. SAM maintains the same hierarchical structure as Latent Dirichlet Allocation (LDA), but models documents as points on a high-dimensional spherical manifold, allowing a natural likelihood parameterization in terms of cosine distance. Furthermore, SAM can model word absence/presence at the document level, and unlike previous models can assign explicit negative weight to topic terms. Performance is evaluated empirically, both through human ratings of topic quality and through diverse classification tasks from natural language processing and computer vision. In these experiments, SAM consistently outperforms existing models.
    ML ID: 248
  6. Spherical Topic Models
    [Details] [PDF]
    Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond Mooney
    In NIPS'09 workshop: Applications for Topic Models: Text and Beyond, 2009.
    We introduce the Spherical Admixture Model (SAM), a Bayesian topic model over arbitrary L2 normalized data. SAM models documents as points on a high- dimensional spherical manifold, and is capable of representing negative word- topic correlations and word presence/absence, unlike models with multinomial document likelihood, such as LDA. In this paper, we evaluate SAM as a topic browser, focusing on its ability to model “negative” topic features, and also as a dimensionality reduction method, using topic proportions as features for difficult classification tasks in natural language processing and computer vision.
    ML ID: 237
  7. Model-based Overlapping Clustering
    [Details] [PDF]
    A. Banerjee, C. Krumpelman, S. Basu, Raymond J. Mooney and Joydeep Ghosh
    In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-05), 2005.
    While the vast majority of clustering algorithms are partitional, many real world datasets have inherently overlapping clusters. The recent explosion of analysis on biological datasets, which are frequently overlapping, has led to new clustering models that allow hard assignment of data points to multiple clusters. One particularly appealing model was proposed by Segal et al. in the context of probabilistic relational models (PRMs) applied to the analysis of gene microarray data. In this paper, we start with the basic approach of Segal et al. and provide an alternative interpretation of the model as a generalization of mixture models, which makes it easily interpretable. While the original model maximized likelihood over constant variance Gaussians, we generalize it to work with any regular exponential family distribution, and corresponding Bregman divergences, thereby making the model applicable for a wide variety of clustering distance functions, e.g., KL-divergence, Itakura-Saito distance, I-divergence. The general model is applicable to several domains, including high-dimensional sparse domains, such as text and recommender systems. We additionally offer several algorithmic modifications that improve both the performance and applicability of the model. We demonstrate the effectiveness of our algorithm through experiments on synthetic data as well as subsets of 20-Newsgroups and EachMovie datasets.
    ML ID: 163