Weakly-Supervised Bayesian Learning of a CCG Supertagger (2014)
Dan Garrette, Chris Dyer, Jason Baldridge, and Noah A. Smith
We present a Bayesian formulation for weakly-supervised learning of a Combinatory Categorial Grammar (CCG) supertagger with an HMM. We assume supervision in the form of a tag dictionary, and our prior encourages the use of cross-linguistically common category structures as well as transitions between tags that can combine locally according to CCG's combinators. Our prior is theoretically appealing since it is motivated by language-independent, universal properties of the CCG formalism. Empirically, we show that it yields substantial improvements over previous work that used similar biases to initialize an EM-based learner. Additional gains are obtained by further shaping the prior with corpus-specific information that is extracted automatically from raw text and a tag dictionary.
In Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL-2014), pp. 141--150, Baltimore, MD, June 2014.

Slides (PDF) Poster
Dan Garrette Ph.D. Alumni dhg [at] cs utexas edu