UTCS Colloquia/AI - Chris Dyer/Carnegie Mellon University, "Statistical Translation as Constrained Optimization", PAI 3.14
Type of Talk: UTCS Colloquia/AI
Speaker/Affiliation: Chris
Dyer/Carnegie Mellon University
Talk Audience: UTCS Faculty, Graduat
e Students, Undergraduate Students and Outside Interested Parties
Dat
e/Time: Friday, December 2, 2011, 3:00 p.m.
Location: PAI 3.14
nHost: Matt Lease and Ray Mooney
Talk Title: Statistical Translation a
s Constrained Optimization
Talk Abstract:
I discuss translation as a
n optimization problem subject to three kinds of constraints: lexical, rel
ational, and constraints enforcing target-language wellformedness. Lexical
constraints ensure that the lexical choices in the output are meaning-pres
erving; relational constraints ensure that the relationships between words
and phrases (e.g., semantic roles and modifier-head relationships) are pr
operly transformed; and target-language wellformedness constraints ensure
the grammaticality of the output. In terms of the traditional source-channe
l model of Brown et al. (1993), the "translation model" encodes lexical an
d relational constraints and the "language model" encodes target language w
ellformedness constraints. This constraint-based framework suggests a discr
iminative (generate-and-test) model of translation in which constraints are
encoded as features sensitive to input and output elements, and the featu
re weights are trained to maximize the (conditional) likelihood of the para
llel data.
To verify the usefulness of the constraint-based approach,
I discuss the performance of two models: first, a lexical translation mod
el evaluated by the word alignments it learns. Unlike previous unsupervised
alignment models, the new model utilizes features that capture diverse le
xical and alignment relationships, including morphological relatedness, o
rthographic similarity, and conventional co-occurrence statistics. Results
from typologically diverse language pairs demonstrate that the feature-ric
h model provides substantial performance benefits compared to state-of-the-
art generative models. Second, I discuss the results of an end-to-end tran
slation system in which lexical, relational, and wellformedness constrain
ts modeled independently. Because of the independence assumptions, the mod
el is substantially more compact than state-of-the-art translation models,
but still performs significantly better on languages where source-target w
ord order differences are substantial.
Speaker Bio:
Chris Dyer is a
postdoctoral researcher in Noah Smith''s lab in the Language Technologies I
nstitute at Carnegie Mellon University. He completed his PhD on statistical
machine translation with Philip Resnik at the University of Maryland in 20
10. Together with Jimmy Lin, he is author of "Data-Intensive Text Processi
ng with MapReduce", published by Morgan & Claypool in 2010. Current resear
ch interests include machine translation, unsupervised learning, Bayesian
techniques, and "big data" problems in NLP.
- About Us
- Research
- Faculty
- Awards & Honors
- Undergraduate Program
- Graduate Program
- Giving & Collaboration
- Careers
- Outreach
- Alumni
- UTCS Direct