Raymond J. Mooney's Presentations
Research Colloquia
-
Learning Language from its Perceptual Context
(PPT file)
presented at
University of Texas at Dallas, Department of Computer Science,
March, 6, 2008; and University of North Texas, Denton, Department of
Computer Science, March 7, 2008.
ABSTRACT:
Current systems that learn to process natural language require
laboriously constructed human-annotated training data. Ideally, a
computer would be able to acquire language like a child by being
exposed to linguistic input in the context of a relevant but ambiguous
perceptual environment. As a step in this direction, we present a
system that learns language from sportscasts of simulated soccer
games. The training data consists of textual human commentaries on
Robocup simulation games. A set of possible meanings for each comment
is automatically constructed from game event traces. Our previously
developed systems for learning to parse and generate natural language
(KRISP and WASP) were augmented to learn from this data and then
commentate novel games. The system is evaluated based on its ability
to parse sentences into correct meanings and generate accurate
descriptions of game events. Human evaluation was also conducted on
the overall quality of the generated sportscasts and compared to
human-generated commentaries.
Joint work with David Chen, Rohit Kate, and Yuk Wah Wong
-
Learning for Semantic Parsing of Natural Language
(PPT file)
presented at
Carnegie Mellon University, Pittsburgh, PA, Dec. 12, 2005; and
Department of Computer Science, University of Illinois at Urbana-Champaign,
April 28, 2006.
ABSTRACT:
Semantic parsing is the task of mapping a natural-language sentence
into a detailed formal representation of its meaning. This talk
presents a summary of our research on learning semantic parsers from
corpora of sentences annotated with formal representations. Our
original work employed inductive-logic programming methods to learn
deterministic symbolic parsers, our more recent work has applied
current techniques from statistical syntactic parsing, machine
translation, and support vector machines using string kernels to learn
more robust semantic parsers. We present results on learning to
interpret natural language database queries and robot commands
(Robocup coaching instructions).
Joint work with Ruifang Ge, Rohit Kate, and Yuk-Wah Wong
-
Learning to Extract Proteins and their Interactions from Medline Abstracts
(PPT file)
presented at the University of
Washington, Seattle, WA, June 22, 2005.
ABSTRACT:
Automatically extracting information from biomedical text holds the
promise of easily consolidating large amounts of biological knowledge
in computer-accessible form. This strategy is particularly attractive
for extracting data on human genes from the 11 million abstracts in
Medline. We have developed and evaluated a variety of learned
information-extraction systems for identifying human proteins and
their interactions in Medline abstracts. We will present our current
best results on identifying names of human proteins using Conditional
Random Fields and Relational Markov Networks. We will also present
our current best results on identifying interactions between proteins
using a Support Vector Machine with an underlying string
kernel. Finally, we will summarize results from a recent large-scale
application of our techniques, in which we mined 753,459 Medline
abstracts to extract a database of 6,580 interactions between 3,737
human proteins. By merging this extracted data with existing
databases, we have constructed (to our knowledge) the largest database
of known human-protein interactions containing 31,609 interactions
amongst 7,748 proteins.
Joint work with Razvan Bunescu, Edward Marcotte, Ruifang Ge, Rohit
Kate, Yuk-Wah Wong, and Arun Ramani.
-
Learning to Extract Proteins and their Interactions from Medline Abstracts
(PPT file)
presented at the University of
Pennsylvania 3/2/04, and Cornell University 10/21/04.
ABSTRACT:
Automatically extracting information from biomedical text holds the
promise of easily consolidating large amounts of biological knowledge
in computer-accessible form. This strategy is particularly attractive
for extracting data on human genes from the 11 million abstracts in
Medline. We have developed and evaluated a variety of learned
information-extraction systems for identifying human proteins and
their interactions in Medline abstracts. We demonstrate that
machine-learning approaches using support-vector machines,
maximum-entropy, and conditional random fields are able to identify
human proteins with higher accuracy than several previous
approaches. We also demonstrate that various rule induction methods
are able to identify protein interactions more accurately than
manually-developed rules. I will also discuss our recent results on
collectively extracting all protein names in an abstract using
Relational Markov Networks that utilize specific relations between
possible protein references.
Joint work with Razvan Bunescu, Edward Marcotte, Ruifang Ge, Rohit
Kate, Yuk-Wah Wong, and Arun Ramani.
-
Semi-Supervised Clustering and its Application to Document Clustering and
Record Linkage
(PPT file)
presented
at the Univ. of Maryland College Park 7/9/03, Naval Research Laboratory 12/15/03,
and Google Inc. 3/25/04.
ABSTRACT:
Semi-supervised clustering uses a small amount of labeled data to aid
the clustering of unlabeled data. It therefore learns from both
labeled and unlabeled data differently than semi-supervised
classification methods like co-training and transductive SVMs. We
present two new algorithms that allow supervised data to bias
clustering. The first approach uses labeled data to seed and
constrain the k-means clustering algorithm, and has been successfully
applied to clustering text documents into topic-based categories. The
second approach applies EM and SVMs to labeled data to train an
adaptive similarity metric for comparing textual database records, and
then applies hierarchical agglomerative clustering with the trained
metric to cluster unlabeled records. This approach has been
successfully applied to record-linkage, the problem of identifying
syntactically distinct but similar database records (such as mailing
addresses or bibliographic citations) that refer to the same entity.
Finally, we discuss combining the two approaches and actively
selecting the most informative labeled data.
Joint work with Sugato Basu, Misha Bilenko, and Arindam Banerjee
Conferences and Workshops
-
Learning for Semantic Parsing of Natural Language
(PPT file)
Reconnecting Computational Linguistics to Artificial Intelligence and
Cognitive Science ("Special event")
(PPT file)
Invited keynote lectures presented at
Computational Linguistics and Intelligent Text Processing: The
8th International Conference (CICLing-07), Mexico City,
February 22, 2007.
ABSTRACT:
Semantic parsing is the task of mapping a natural-language sentence
into a detailed formal representation of its meaning. This talk
presents a summary of our research on learning semantic parsers from
corpora of sentences annotated with formal representations. Our
original work employed inductive-logic programming methods to learn
deterministic symbolic parsers, our more recent work has applied
current techniques from statistical syntactic parsing, statistical
machine translation, and support vector machines using string kernels
to learn more robust semantic parsers. We present results on learning
to interpret natural language database queries and robot commands
(Robocup coaching instructions).
Joint work with Ruifang Ge, Rohit Kate, Yuk-Wah Wong, John Zelle, and Cynthia
Thompson
-
Learning for Semantic Parsing of Natural Language
(PPT file)
Invited keynote lecture presented
at the International Joint Conference on Artificial Intelligence (IJCAI)
2005 Workshop on Grammatical Inference Applications: Successes and Future
Challenges, Edinburgh, Scotland, Jul. 31, 2005.
ABSTRACT:
Semantic parsing is the process of mapping natural-language sentences into a
formal representation of their meaning. This talk presents a summary of our
research on learning semantic parsers from annotated corpora. Our original work
employed inductive-logic programming methods to learn deterministic symbolic
parsers, our more recent work uses either transformation rules or statistical
parsing methods to learn more robust semantic parsers. We present results on
learning to interpret natural language database queries and robot commands
(Robocup coaching instructions).
Joint work with Ruifang Ge, Rohit Kate, Yuk-Wah Wong, John Zelle, and Cynthia
Thompson
-
Diverse Ensembles for Active Learning
(PPT
file)
presented at the 21st International Conference on Machine
Learning (ICML-2004), Banff, Canada, July 2004.
ABSTRACT:
Query by Committee is an effective approach to selective sampling in which
disagreement amongst an ensemble of hypotheses is used to select data for
labeling. Query by Bagging and Query by Boosting are two practical
implementations of this approach that use Bagging and Boosting, respectively,
to build the committees. For effective active learning, it is critical that the
committee be made up of consistent hypotheses that are very different from each
other. DECORATE is a recently developed method that directly constructs such
diverse committees using artificial training data. This paper introduces
Active-Decorate, which uses Decorate committees to select good training
examples. Extensive experimental results demonstrate that, in general,
Active-DECORATE outperforms both Query by Bagging and Query by Boosting.
Joint work with Prem Melville
-
Learning Semantic Parsers: An Important but Under-Studied Problem
(PPT file)
presented at the AAAI 2004 Spring Symposium on Language Learning: An
Interdisciplinary Perspective, Stanford, CA, March 2004.
ABSTRACT:
Computational systems that learn to transform natural-language sentences into
semantic representations have important practical applications in building
natural-language interfaces. They can also provide insight into important
issues in human language acquisition. However, within AI, computational
linguistics, and machine learning, there has been relatively little research on
developing systems that learn such semantic parsers. This paper briefly reviews
our own work in this area and presents semantic-parser acquistion as an
important challenge problem for AI.
Philosophical/Historical/Methodological Talks
-
All You Really Need to Know About Computer Science
Was Learned Pursuing Artificial Intelligence
(PPT
file)(PS file)
presented Sept. 1, 2004, Dept. of Computer Sciences, Univ. of Texas at Austin,
and Cornell University 10/21/04.
ABSTRACT:
Most of the fundamental concepts in computing were developed by people
who were trying to understand, emulate, or augment the human mind.
This list of concepts includes Boolean logic, finite-state machines,
formal grammars, Turing machines, linked lists, recursion, garbage
collection, combinatorial search, automated theorem proving,
time-shared operating systems, computer networks, graphical user
interfaces, and computational complexity theory. This talk will
describe how the history of all of these fundamental computing
concepts is ultimately rooted in the historical pursuit of artificial
intelligence. Unfortunately, subsequently, AI has become increasingly
isolated from the rest of computer science, to the detriment of both.
I believe the time is ripe for a re-integration of AI into the rest of
computing. My goal is to start the semester with a light, mildly
entertaining, and potentially controversial talk that provokes thought
and discussion about the role of AI in the broader enterprise of
computer science.
-
Computing as an Experimental Science, or Exaggerated Formalist Rhetoric Considered Harmful
(PPT file)
presented Jan. 17, 2002, Dept. of Computer Sciences, Univ. of Texas at Austin.
ABSTRACT:
Some computing problems require an experimental rather than a formal
mathematical approach to evaluating correctness or average-case time
complexity. Exaggerated rhetoric of some formalists in computer science seems
to deny this fundamental proposition. I will explain and defend the
fundamental role of experimentation in the study of problems whose definitions
involve unformalized empirical phenomena in the world. I will also discuss the
lack of methodological rigor in most current experimental computer science and
its connection to educational and curricular deficiencies. My goal is to start
the semester with a potentially controversial but mildly entertaining talk that
provokes thought and discussion about important methodological and
philosophical issues regarding computing as a scientific discipline.
- AI & Atheism: AI - Mind without Mysticism: Atheism - Life, the Universe,
and Everything without Mysticism
(PPT file)
presented Nov. 30, 2001, Forum for AI, Debate with B. Kuipers on "AI and
Religion", Dept. of Computer Sciences, Univ. of Texas at Austin.
ABSTRACT:
Both AI and Atheism depend on a philosphy of materialism while religion
relies on a philosophy of dualism. As such, I believe that belief in "strong
AI" is incompatible with traditional religous beliefs.
mooney@cs.utexas.edu