UTCS AI Colloquia - Chris Callison-Burch, Johns Hopkins University, "Large-scale paraphrasing for natural language understanding and generation,"
Signup Schedule: http://apps.cs.utexas.edu/talkschedules/cgi/list_events.cgi
Talk Audience: UTCS Faculty, Grads, Undergrads, Other Interested Parties
Host: Ray Mooney
Talk Abstract: I will present my method for learning paraphrases - pairs of English expressions with equivalent meaning - from the bilingual parallel corpora, which are more commonly used to train statistical machine translation systems. My method pairs English phrases like (thrown into jail, imprisoned) when they shared an aligned foreign phrase like festgenommen. Because bitexts are large and because a phrase can be aligned many different foreign phrases (including phrases in multiple foreign languages), the method extracts a diverse set of paraphrases. For thrown into jail, we not only learn imprisoned, but also arrested, detained, incarcerated, jailed, locked up, taken into custody, and thrown into prison, along with a set of incorrect/noisy paraphrases. I'll show a number of method for filtering out the poor paraphrases, by defining a paraphrase probability calculated from translation model probabilities, and by re-ranking the candidate paraphrases using monolingual distributional similarity measures.
Speaker Bio: Chris Callison-Burch is an Associate Research Professor in the Computer Science Department at Johns Hopkins University, where he has built a research group within the Center for Language and Speech Processing (CLSP). He has accepted a tenure-track faculty job at the University of Pennsylvania starting in September 2013. He received his PhD from the University of Edinburgh's School of Informatics and his bachelors from Stanford University's Symbolic Systems Program. His research focuses on statistical machine translation, crowdsourcing, and broad coverage semantics via paraphrasing. He has contributed to the research community by releasing open source software like Moses and Joshua, and by organizing the shared tasks for the annual Workshop on Statistical Machine Translation (WMT). He is the Chair of the North American chapter of the Association for Computational Linguistics (NAACL) and serves on the editorial boards of Computational Linguistics and the Transactions of the ACL.
- Awards & Honors
- About Us
- Student Engagement and Support
- Masters Program
- Ph.D. Program
- Financial Information
- Prospective Students
- Incoming Students
- Current Students
- Curricular Practical Training
- Grad Student Talks
- UTCS Direct