UTCS Colloquium/AI-Dan Klein/University of California at Berkeley: "Phylogenetic Models for Natural Language," ACES 2.402, Friday, March 12, 2010, 11:00 a.m.

Contact Name: 
Jenna Whitney
Date: 
Mar 12, 2010 11:00am - 12:00pm

There is a sign-up schedule for this event that can be found
at http://www.cs.utexas.edu/department/webeven

t/utcs/events/cgi/list_events.cgi

Type of Talk: UTCS Colloquium/A

I

Speaker/Affiliation: Dan Klein/University of California at Berkeley

Date/Time: Friday, March 12, 2010, 11:00 a.m.

Location: ACE

S 2.402

Host: Ray Mooney

Talk Title: Phylogenetic Models for N

atural Language

Talk Abstract:

Languages descend in a roughly t

ree-structured evolutionary process. In
historical linguistics, this

process is manually analyzed by comparing and
contrasting modern langu

ages. Many questions arise: What does the tree of
languages look like?
What are the ancestral forms of modern words? What
functional pressur

es shape language change? In this talk, I''ll describe our
work on br

inging large-scale computational methods to bear on these
problems.

In the task of proto-word reconstruction, we infer ancestral word

s from
their modern forms. I''ll present a statistical model in which

each word''s
history is traced down a phylogeny. Along each branch, w

ords mutate
according to regular, learned sound changes. Experiments

in the Romance and
Oceanic families show that accurate automated recon

struction is possible;
using more languages leads to better results.<

br />
Standard reconstruction models assume that one already knows whi

ch words are
cognate, i.e., are descended from the same ancestral wo

rd. However, cognate
detection is its own challenge. I''ll describe m

odels which can automatically
detect cognates (in similar languages) a

nd translations (in divergent
languages). Typical translation-learning
approaches require virtual Rosetta
stones -- collections of bilingual
texts. In contrast, I''ll discuss models
which operate on monolingua

l texts alone.

Finally, I''ll present work on multilingual gram

mar induction, where many
languages'' grammars are simultaneously ind

uced. By assuming that grammar
parameters vary slowly, again along a

phylogenetic tree, we can obtain
substantial increases in grammar qua

lity across the board.

Speaker Bio:

Dan Klein is an associate p

rofessor of computer science at the University
of California, Berkele

y (PhD Stanford, MSt Oxford, BA Cornell). His research
focuses on st

atistical natural language processing, including unsupervised
learnin

g methods, syntactic analysis, information extraction, and machine

translation. Academic honors include a Marshall Fellowship, a Microsoft Ne

w
Faculty Fellowship, a Sloan Fellowship, an NSF CAREER award, the

ACM Grace
Murray Hopper award, and best paper awards at the ACL, NAA

CL, and EMNLP
conferences.