Capturing Semantically Meaningful Word Dependencies with an Admixture of Poisson MRFs (code)
D. Inouye, P. Ravikumar, I. Dhillon
To appear in Neural Information Processing Systems (NIPS) 27, 2014.
Admixture of Poisson MRFs: A Topic Model with Word Dependencies
D. Inouye, P. Ravikumar, I. Dhillon
In International Conference on Machine Learning (ICML) 31, 2014.
This project develops a new topic model based on an admixture of Poisson Markov Random Fields (APM), which can model dependencies between words as opposed to previous independent topic models such as PLSA (Hofmann, 1999), LDA (Blei, 2003) or SAM (Reisinger, 2010). More generally, this project considers a class of admixture models that generalizes previous topic models. In addition, we show an equivalence between the conditional distribution of LDA and independent Poissons---suggesting that APM subsumes the modeling power of LDA. This project also considers tractable methods for estimating the parameters of an APM based on the pseudo log-likelihood. Essentially, the estimation method turns into a large-scale optimization problem. Some preliminary qualitative and quantitative experiments show that APM has potential but more experimental analysis will be needed and thus this project seeks to explore the power and usefulness of APM.
Topic Models as Admixtures
In this project, we develop the notion of admixtures and show that this notion generalizes many previous topic models including PLSA (Hofmann, 1999), LDA (Blei, 2003) and SAM (Reisinger, 2010). In contrast to mixture distributions which assume that each observation is drawn from 1 of k component distributions, admixture distributions assume that each observation is drawn from an admixed distribution whose parameters are a mixture of component parameters. See Figure 1 for an illustration of the difference between admixtures and mixtures. As examples of admixtures, PLSA and LDA are admixtures of Multinomials whereas SAM is an admixture of Von-Mises Fisher distributions. In addition, because of the connections between Poissons and Multinomials, PLSA and LDA can be seen as admixtures of independent Poisson distributions. In this project, we remove this independence assumption from the Poisson distributions.
Poisson MRF vs. Independent Distributions
As stated in the previous section, instead of using independent Poisson distributions for the component distributions, we introduce using a Poisson MRF for the component distributions so that we can model dependencies between words. A multivariate generalization of the univariate Poisson was recently introduced by (Yang, 2012). This new Poisson MRF model assumes that the conditional distributions are univariate Poisson which is similar to a Gaussian MRF (unlike a Guassian MRF, however, the marginals are not univariate Poisson). This allows for a much more powerful topic distribution than an independent distribution. For example, if the word “classification” appears, “supervised” is more likely to appear than in general documents. Moreover, since these are graphical model distributions, the dependencies are Markov with respect to an undirected graph, which thus provides a visually appealing representation of any topic—see Fig. 2 below—in contrast to just a list of words as in PLSA/LDA.
Admixture of Poisson MRFs
The main model introduced by this project is an admixture of Poisson MRFs and thus combines the two concepts discussed in the previous sections to provide a topic model whose topics can model the dependencies between words.
Tractable Parameter Estimation
Essentially, we choose to maximize the posterior (i.e. a MAP estimate) with a exponential prior on the dependency matrix (which is similar to the precision matrix of a Gaussian MRF). The exponential prior helps to enforce sparsity of the parameters for reasons of interpretability, generalizability and computational tractability. Because the likelihood of a Poisson MRF is intractable, we use the psuedo-likelihood instead. See (Inouye, 2014) for more details.