Publications: Unsupervised Learning, Clustering, and Self-Organization
Unsupervised learning does not require annotation or labeling from a human teacher; the idea is to learn the structure of the data from unlabeled examples. The most common unsupervised learning task is clustering, i.e. grouping instances into a discovered set of categories containing similar instances. Self-organizing maps in addition visualize the topology of the clusters on a map. Our work in this area includes applications on lexical semantics, topic modeling, and discovering latent class models, as well as methods for laterally connected, hierarchical, sequential-input, and growing self-organizing maps.
- A Mixture Model with Sharing for Lexical Semantics
[Details] [PDF] [Slides]
Joseph Reisinger and Raymond J. Mooney
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), 1173--1182, MIT, Massachusetts, USA, October 9--11 2010.We introduce tiered clustering, a mixture
model capable of accounting for varying degrees
of shared (context-independent) feature
structure, and demonstrate its applicability
to inferring distributed representations of
word meaning. Common tasks in lexical semantics
such as word relatedness or selectional
preference can benefit from modeling
such structure: Polysemous word usage is often
governed by some common background
metaphoric usage (e.g. the senses of line or
run), and likewise modeling the selectional
preference of verbs relies on identifying commonalities
shared by their typical arguments.
Tiered clustering can also be viewed as a form
of soft feature selection, where features that do
not contribute meaningfully to the clustering
can be excluded. We demonstrate the applicability
of tiered clustering, highlighting particular
cases where modeling shared structure is
beneficial and where it can be detrimental.
ML ID: 252
- Cross-cutting Models of Distributional Lexical Semantics
[Details] [PDF] [Slides]
Joseph S. Reisinger
June 2010. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.In order to respond to increasing demand for natural language interfaces—and provide
meaningful insight into user query intent—fast, scalable lexical semantic models
with flexible representations are needed. Human concept organization is a rich epiphenomenon
that has yet to be accounted for by a single coherent psychological framework:
Concept generalization is captured by a mixture of prototype and exemplar
models, and local taxonomic information is available through multiple overlapping
organizational systems. Previous work in computational linguistics on extracting lexical
semantic information from the Web does not provide adequate representational
flexibility and hence fails to capture the full extent of human conceptual knowledge.
In this proposal I will outline a family of probabilistic models capable of accounting
for the rich organizational structure found in human language that can predict contextual
variation, selectional preference and feature-saliency norms to a much higher
degree of accuracy than previous approaches. These models account for cross-cutting
structure of concept organization—i.e. the notion that humans make use of different
categorization systems for different kinds of generalization tasks—and can be applied
to Web-scale corpora. Using these models, natural language systems will be able to
infer a more comprehensive semantic relations, in turn improving question answering,
text classification, machine translation, and information retrieval.
ML ID: 249
- Spherical Topic Models
[Details] [PDF] [Slides]
Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J. Mooney
In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), 2010.We introduce the Spherical Admixture Model
(SAM), a Bayesian topic model for arbitrary L2
normalized data. SAM maintains the same hierarchical
structure as Latent Dirichlet Allocation
(LDA), but models documents as points on
a high-dimensional spherical manifold, allowing
a natural likelihood parameterization in terms of
cosine distance. Furthermore, SAM can model
word absence/presence at the document level,
and unlike previous models can assign explicit
negative weight to topic terms. Performance is
evaluated empirically, both through human ratings
of topic quality and through diverse classification
tasks from natural language processing
and computer vision. In these experiments, SAM
consistently outperforms existing models.
ML ID: 248
- Spherical Topic Models
[Details] [PDF]
Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond Mooney
In NIPS'09 workshop: Applications for Topic Models: Text and Beyond, 2009.We introduce the Spherical Admixture Model (SAM), a Bayesian topic model over arbitrary L2 normalized data. SAM models documents as points on a high- dimensional spherical manifold, and is capable of representing negative word- topic correlations and word presence/absence, unlike models with multinomial document likelihood, such as LDA. In this paper, we evaluate SAM as a topic browser, focusing on its ability to model “negative” topic features, and also as a dimensionality reduction method, using topic proportions as features for difficult classification tasks in natural language processing and computer vision.
ML ID: 237
- Model-based Overlapping Clustering
[Details] [PDF]
A. Banerjee, C. Krumpelman, S. Basu, Raymond J. Mooney and Joydeep Ghosh
In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-05), 2005.While the vast majority of clustering algorithms are partitional, many real world datasets have inherently overlapping clusters. The recent explosion of analysis on biological datasets, which are frequently overlapping, has led to new clustering models that allow hard assignment of data points to multiple clusters. One particularly appealing model was proposed by Segal et al. in the context of probabilistic relational models (PRMs) applied to the analysis of gene microarray data. In this paper, we start with the basic approach of Segal et al. and provide an alternative interpretation of the model as a generalization of mixture models, which makes it easily interpretable. While the original model maximized likelihood over constant variance Gaussians, we generalize it to work with any regular exponential family distribution, and corresponding Bregman divergences, thereby making the model applicable for a wide variety of clustering distance functions, e.g., KL-divergence, Itakura-Saito distance, I-divergence. The general model is applicable to several domains, including high-dimensional sparse domains, such as text and recommender systems. We additionally offer several algorithmic modifications that improve both the performance and applicability of the model. We demonstrate the effectiveness of our algorithm through experiments on synthetic data as well as subsets of 20-Newsgroups and EachMovie datasets.
ML ID: 163