Spherical Topic Models (2010)
We introduce the Spherical Admixture Model (SAM), a Bayesian topic model for arbitrary L2 normalized data. SAM maintains the same hierarchical structure as Latent Dirichlet Allocation (LDA), but models documents as points on a high-dimensional spherical manifold, allowing a natural likelihood parameterization in terms of cosine distance. Furthermore, SAM can model word absence/presence at the document level, and unlike previous models can assign explicit negative weight to topic terms. Performance is evaluated empirically, both through human ratings of topic quality and through diverse classification tasks from natural language processing and computer vision. In these experiments, SAM consistently outperforms existing models.
In Proceedings of the 27th International Conference on Machine Learning (ICML 2010) 2010.

Slides (PDF)
Raymond J. Mooney Faculty mooney [at] cs utexas edu
Joseph Reisinger Ph.D. Alumni joeraii [at] cs utexas edu
Joseph Reisinger Formerly affiliated Ph.D. Student joeraii [at] cs utexas edu
Bryan Silverthorn Ph.D. Alumni bsilvert [at] cs utexas edu
Austin Waters Ph.D. Alumni austin [at] cs utexas edu