Publications: Unsupervised Learning, Clustering, and Self-Organization
Unsupervised learning does not require annotation or labeling from a human teacher; the idea is to learn the structure of the data from unlabeled examples. The most common unsupervised learning task is clustering, i.e. grouping instances into a discovered set of categories containing similar instances. Self-organizing maps in addition visualize the topology of the clusters on a map. Our work in this area includes applications on lexical semantics, topic modeling, and discovering latent class models, as well as methods for laterally connected, hierarchical, sequential-input, and growing self-organizing maps.
- Continually Improving Grounded Natural Language Understanding through Human-Robot Dialog
[Details] [PDF]
Jesse Thomason
PhD Thesis, Department of Computer Science, The University of Texas at Austin, April 2018.As robots become ubiquitous in homes and workplaces such as hospitals and factories, they must be able to communicate with humans. Several kinds of knowledge are required to understand and respond to a human's natural language commands and questions. If a person requests an assistant robot to "take me to Alice's office", the robot must know that Alice is a person who owns some unique office, and that "take me" means it should navigate there. Similarly, if a person requests "bring me the heavy, green mug", the robot must have accurate mental models of the physical concepts "heavy", "green", and "mug". To avoid forcing humans to use key phrases or words robots already know, this thesis focuses on helping robots understanding new language constructs through interactions with humans and with the world around them.
To understand a command in natural language, a robot must first convert that command to an internal representation that it can reason with. Semantic parsing is a method for performing this conversion, and the target representation is often semantic forms represented as predicate logic with lambda calculus. Traditional semantic parsing relies on hand-crafted resources from a human expert: an ontology of concepts, a lexicon connecting language to those concepts, and training examples of language with abstract meanings. One thrust of this thesis is to perform semantic parsing with sparse initial data. We use the conversations between a robot and human users to induce pairs of natural language utterances with the target semantic forms a robot discovers through its questions, reducing the annotation effort of creating training examples for parsing. We use this data to build more dialog-capable robots in new domains with much less expert human effort.
Meanings of many language concepts are bound to the physical world. Understanding object properties and categories, such as "heavy", "green", and "mug" requires interacting with and perceiving the physical world. Embodied robots can use manipulation capabilities, such as pushing, picking up, and dropping objects to gather sensory data about them. This data can be used to understand non-visual concepts like "heavy" and "empty" (e.g. "get the empty carton of milk from the fridge"), and assist with concepts that have both visual and non-visual expression (e.g. "tall" things look big and also exert force sooner than "short" things when pressed down on). A second thrust of this thesis focuses on strategies for learning these concepts using multi-modal sensory information. We use human-in-the-loop learning to get labels between concept words and actual objects in the environment. We also explore ways to tease out polysemy and synonymy in concept words such as "light", which can refer to a weight or a color, the latter sense being synonymous with "pale". Additionally, pushing, picking up, and dropping objects to gather sensory information is prohibitively time-consuming, so we investigate strategies for using linguistic information and human input to expedite exploration when learning a new concept.
Finally, we build an integrated agent with both parsing and perception capabilities that learns from conversations with users to improve both components over time. We demonstrate that parser learning from conversations can be combined with multi-modal perception using predicate-object labels gathered through opportunistic active learning during those conversations to improve performance for understanding natural language commands from humans. Human users also qualitatively rate this integrated learning agent as more usable after it has improved from conversation-based learning.
ML ID: 361
- Unsupervised Code-Switching for Multilingual Historical Document Transcription
[Details] [PDF] [Slides (PDF)]
Dan Garrette and Hannah Alpert-Abrams and Taylor Berg-Kirkpatrick and Dan Klein
In Proceedings the 2015 Conference of the North American Chapter of the Association for Computational Linguistics -- Human Language Technologies (NAACL HLT 2015), 1036--1041, Denver, Colorado, June 2015.Transcribing documents from the printing press era, a challenge in its own right, is more
complicated when documents interleave multiple languages—a common feature of
16th century texts. Additionally, many of these documents precede
consistent orthographic conventions, making the task even harder. We extend the
state-of-the-art historical OCR model of Berg-Kirkpatrick et al. (2013) to handle
word-level code-switching between multiple languages.
Further, we enable our system to handle spelling variability, including
now-obsolete shorthand systems used by printers. Our results show average
relative character error reductions of 14% across a variety of historical texts.
ML ID: 312
- A Mixture Model with Sharing for Lexical Semantics
[Details] [PDF] [Slides (PDF)]
Joseph Reisinger and Raymond J. Mooney
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), 1173--1182, MIT, Massachusetts, USA, October 9--11 2010.We introduce tiered clustering, a mixture
model capable of accounting for varying degrees
of shared (context-independent) feature
structure, and demonstrate its applicability
to inferring distributed representations of
word meaning. Common tasks in lexical semantics
such as word relatedness or selectional
preference can benefit from modeling
such structure: Polysemous word usage is often
governed by some common background
metaphoric usage (e.g. the senses of line or
run), and likewise modeling the selectional
preference of verbs relies on identifying commonalities
shared by their typical arguments.
Tiered clustering can also be viewed as a form
of soft feature selection, where features that do
not contribute meaningfully to the clustering
can be excluded. We demonstrate the applicability
of tiered clustering, highlighting particular
cases where modeling shared structure is
beneficial and where it can be detrimental.
ML ID: 252
- Cross-cutting Models of Distributional Lexical Semantics
[Details] [PDF] [Slides (PDF)]
Joseph S. Reisinger
June 2010. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.In order to respond to increasing demand for natural language interfaces—and provide
meaningful insight into user query intent—fast, scalable lexical semantic models
with flexible representations are needed. Human concept organization is a rich epiphenomenon
that has yet to be accounted for by a single coherent psychological framework:
Concept generalization is captured by a mixture of prototype and exemplar
models, and local taxonomic information is available through multiple overlapping
organizational systems. Previous work in computational linguistics on extracting lexical
semantic information from the Web does not provide adequate representational
flexibility and hence fails to capture the full extent of human conceptual knowledge.
In this proposal I will outline a family of probabilistic models capable of accounting
for the rich organizational structure found in human language that can predict contextual
variation, selectional preference and feature-saliency norms to a much higher
degree of accuracy than previous approaches. These models account for cross-cutting
structure of concept organization—i.e. the notion that humans make use of different
categorization systems for different kinds of generalization tasks—and can be applied
to Web-scale corpora. Using these models, natural language systems will be able to
infer a more comprehensive semantic relations, in turn improving question answering,
text classification, machine translation, and information retrieval.
ML ID: 249
- Spherical Topic Models
[Details] [PDF] [Slides (PDF)]
Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J. Mooney
In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), 2010.We introduce the Spherical Admixture Model
(SAM), a Bayesian topic model for arbitrary L2
normalized data. SAM maintains the same hierarchical
structure as Latent Dirichlet Allocation
(LDA), but models documents as points on
a high-dimensional spherical manifold, allowing
a natural likelihood parameterization in terms of
cosine distance. Furthermore, SAM can model
word absence/presence at the document level,
and unlike previous models can assign explicit
negative weight to topic terms. Performance is
evaluated empirically, both through human ratings
of topic quality and through diverse classification
tasks from natural language processing
and computer vision. In these experiments, SAM
consistently outperforms existing models.
ML ID: 248
- Spherical Topic Models
[Details] [PDF]
Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond Mooney
In NIPS'09 workshop: Applications for Topic Models: Text and Beyond, 2009.We introduce the Spherical Admixture Model (SAM), a Bayesian topic model over arbitrary L2 normalized data. SAM models documents as points on a high- dimensional spherical manifold, and is capable of representing negative word- topic correlations and word presence/absence, unlike models with multinomial document likelihood, such as LDA. In this paper, we evaluate SAM as a topic browser, focusing on its ability to model “negative” topic features, and also as a dimensionality reduction method, using topic proportions as features for difficult classification tasks in natural language processing and computer vision.
ML ID: 237
- Model-based Overlapping Clustering
[Details] [PDF]
A. Banerjee, C. Krumpelman, S. Basu, Raymond J. Mooney and Joydeep Ghosh
In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-05), 2005.While the vast majority of clustering algorithms are partitional, many real world datasets have inherently overlapping clusters. The recent explosion of analysis on biological datasets, which are frequently overlapping, has led to new clustering models that allow hard assignment of data points to multiple clusters. One particularly appealing model was proposed by Segal et al. in the context of probabilistic relational models (PRMs) applied to the analysis of gene microarray data. In this paper, we start with the basic approach of Segal et al. and provide an alternative interpretation of the model as a generalization of mixture models, which makes it easily interpretable. While the original model maximized likelihood over constant variance Gaussians, we generalize it to work with any regular exponential family distribution, and corresponding Bregman divergences, thereby making the model applicable for a wide variety of clustering distance functions, e.g., KL-divergence, Itakura-Saito distance, I-divergence. The general model is applicable to several domains, including high-dimensional sparse domains, such as text and recommender systems. We additionally offer several algorithmic modifications that improve both the performance and applicability of the model. We demonstrate the effectiveness of our algorithm through experiments on synthetic data as well as subsets of 20-Newsgroups and EachMovie datasets.
ML ID: 163