UTCS Colloquium/AI-Burr Settles/CMU: "Asking the Right Questions: New Query Types for Active Learning," ACES 2.402, Thursday, March 11, 2010, 2:00 p.m.

Contact Name: 
Jenna Whitney
Date: 
Mar 11, 2010 2:00pm - 3:00pm

There is a sign-up schedule for this event that can be found
at http://www.cs.utexas.edu/department/webeven

t/utcs/events/cgi/list_events.cgi

Type of Talk: UTCS Colloquium/A

I

Speaker/Affiliation: Burr Settles/CMU

Date/Time: Thursday, M

arch 11, 2010, 2:00 p.m.

Location: ACES 2.402

Host: Ray Moone

y

Talk Title: Asking the Right Questions: New Query Types for Active
Learning

Talk Abstract:

The key idea behind active learning is
that a machine learning algorithm
can achieve greater accuracy with l

ess training if it is allowed to choose
the data from which it learns.
In this talk, I present two recent active
learning paradigms in whic

h learning algorithms may pose novel types of
"queries" of
human annotators to great effect. We call these new paradigms
"

multiple-instance active learning" and "feature active learning

."

In traditional active learning, a partially-trained mo

del selects new data
instances to be labeled by a human annotator, wh

ich are then added to the
training set and the process repeats. In a t

ext classification task, for
example, the learner might query for th

e labels of informative-looking
documents. However, having a human re

ad an entire document can be an
inefficient use of time, particularly
when only certain passages or keywords
are relevant to the task at ha

nd. Multiple-instance active learning
addresses this problem by allowi

ng the model to selectively obtain more
focused labels at the passage

level in cases where noisy document-level
labels might be available (e

.g., from hyperlinks or citation databases).
This active learning app

roach provides a direct training signal to the
learner and is also les

s cumbersome for humans to read. Likewise, feature
active learning al

lows the learner to query for the labels of salient words
(e.g., the

query word "puck" might be labeled "hockey" in a sp

orts article
classification task), which naturally exploits the annot

ator''s inherent
domain knowledge. We show that such alternative query
paradigms, especially
when combined with intuitive user interfaces,
can make more efficient use of
human annotation effort.

[J

oint work with Mark Craven, Soumya Ray, Gregory Druck, and Andrew
M

cCallum.]