UTCS Colloquia/AI - Yejin Choi/SUNY Stony Brook, "In Search of Styles in Language: Identifying Deceptive Product Reviews, Wikipedia Vandalism, and the Gender of Authors via Statistical Stylometric Analysis", ACES 2.302

Contact Name: 
Jenna Whitney
Date: 
Oct 14, 2011 11:00am - 12:00pm

There is a sign-up schedule for this event that can be found at

http://apps.cs.utexas.edu/talkschedules/cgi/list_events.cgi

Type o

f Talk: UTCS Colloquia/AI

Speaker/Affiliation: Yejin Choi/SUNY Stony B

rook

Talk Audience: UTCS Faculty, Graduate and Undergraduate Students

, and Outside Interested Parties

Date/Time: Friday, October 14, 201

1, 11:00 a.m.

Location: ACES 2.302

Host: Raymond Mooney

Tal

k Title: In Search of Styles in Language: Identifying Deceptive Product Rev

iews, Wikipedia Vandalism, and the Gender of Authors via Statistical Styl

ometric Analysis

Talk Abstract:
Language is a window into the mind.

Stylometric analysis, the study of analyzing linguistic styles in language

, can help uncovering the cognitive state and the personal identity of the
writer. In this talk, I will present three case studies of Natural Langua

ge Processing (NLP) tasks that expand the scope of statistical stylometric

analysis. First I will present the study of identifying deceptive product r

eviews, i.e., fake reviews that are written by people who are paid to fab

ricate positive reviews. As it turns out, it is surprisingly hard for huma

n to distinguish fake reviews from truthful ones. Statistical analysis of l

anguage use on the other hand leads to nearly 90% accuracy, and provides u

s new clues in spotting suspicious reviews. Next I will introduce the study
of detecting Wikipedia vandalism, where textual vandalism can be viewed a

s a unique genre in which a group of people with similar purpose share simi

lar linguistic behavior. Finally, I will present the study of gender attri

bution, where we will examine whether there are gender-specific linguistic
signals that go beyond the boundaries of topic and genre, and whether the

y are traceable even in modern and scientific literature.

Speaker Bio:

Yejin Choi is an Assistant Professor in the Computer Science Department

at Stony Brook University (SUNY Stony Brook). She received her Ph.D. in Com

puter Science from Cornell University in 2010 in the area of Natural Langua

ge Processing. Her research interests include stylometric analysis, natura

l language generation from images, and opinion & sentiment analysis in tex

t.