May 2, 2008 11:00am - 12:00pm

Type of Talk: UTCS Colloquium/AI

Speaker/Affiliation: Thamar Solo

rio/University of Texas at Dallas

Date/Time: Friday May 2 2008

Location: ACES 2.402

Host: Raymond Mooney

Talk Title:
Processing Code-Switched Text

Talk Abstract:
Code-switching is a

n interesting linguistic phenomenon
commonly observed in highly bilingu

al communities. It
consists of mixing languages in the same conversatio

event. Despite its popularity this type of discourse has

ved very little attention from the natural language
processing communit

y. Most of the work in this area
attempts to solve problems where the l

anguage samples
either spoken or written are monolingual.

We r

ecently started working on developing a part-of-speech
tagger for Spani

sh-English code-switched text. In the first half
of this talk I will di

scuss results of different approaches to solve
the tagging problem by t

aking advantage of existing resources
for both languages. The long-term
goal of this research is to
develop a full syntactic parser for Englis

h-Spanish code-switched
text commonly known as Spanglish that can be

exploited to
tackle higher-level tasks on mixed-language sources. Alth

the work is focused on English-Spanish bilingual discourse the knowledge acquired from this project can later be extended to
other l

anguage combinations. In the second half I will discuss
a related proj

ect aimed at exploiting our bilingual tagger to
develop an automated sc

reening tool for the early identification
of Specific Language Impairme

nt in Spanish-English bilingual

Speaker Bio:

Solorio is a postdoctoral scholar in the Human Language
Technology Rese

arch Institute at the University of Texas at Dallas.
Before joining UTD
she was a Lecturer in the Computer Science
department at the Universit

y of Texas at El Paso. She received her
PhD in Computer Science in 2005
from the National Institute of
Astrophysics Optics and Electronics i

n Mexico. She is interested
in developing machine learning approaches f

or the syntactic analysis
of interlanguages.