Machine Learning Research Group | University of Texas

Publications: Information Extraction

Information Extraction (IE) is a shallow form of text understanding that extracts substrings about prespecified types of entities or relationships from documents and web pages. Our work has focused on machine learning methods that induce information extractors from manually labeled training examples. Our recent work has focussed on IE for bioinformatics.

The RISE web site is a useful general information resource on IE.

Show abstracts

Stacking With Auxiliary Features
[Details] [PDF] [Slides (PDF)] [Poster]
Nazneen Fatema Rajani and Raymond J. Mooney
In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), 2634-2640, Melbourne, Australia, 2017.
Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision
[Details] [PDF] [Slides (PDF)]
Nazneen Fatema Rajani
November 2016. PhD proposal, Department of Computer Science, The University of Texas at Austin.
Stacked Ensembles of Information Extractors for Knowledge-Base Population
[Details] [PDF] [Slides (PPT)]
Vidhoon Viswanathan and Nazneen Fatema Rajani and Yinon Bentor and Raymond J. Mooney
In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-15), 177-187, Beijing, China, July 2015.
Knowledge Base Population using Stacked Ensembles of Information Extractors
[Details] [PDF]
Vidhoon Viswanathan
Masters Thesis, Department of Computer Science, The University of Texas at Austin, May 2015.
University of Texas at Austin KBP 2014 Slot Filling System: Bayesian Logic Programs for Textual Inference
[Details] [PDF]
Yinon Bentor and Vidhoon Viswanathan and Raymond Mooney
In Proceedings of the Seventh Text Analysis Conference: Knowledge Base Population (TAC 2014), 2014.
University of Texas at Austin KBP 2013 Slot Filling System: Bayesian Logic Programs for Textual Inference
[Details] [PDF]
Yinon Bentor and Amelia Harrison and Shruti Bhosale and Raymond Mooney
In Proceedings of the Sixth Text Analysis Conference (TAC 2013), 2013.
Online Inference-Rule Learning from Natural-Language Extractions
[Details] [PDF] [Poster]
Sindhu Raghavan and Raymond J. Mooney
In Proceedings of the 3rd Statistical Relational AI (StaRAI-13) workshop at AAAI '13, July 2013.
Bayesian Logic Programs for Plan Recognition and Machine Reading
[Details] [PDF] [Slides (PPT)]
Sindhu Raghavan
PhD Thesis, Department of Computer Science, University of Texas at Austin, December 2012. 170.
Learning to "Read Between the Lines" using Bayesian Logic Programs
[Details] [PDF] [Slides (PPT)]
Sindhu Raghavan and Raymond J. Mooney and Hyeonseo Ku
In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL-2012), 349--358, July 2012.
Fine-Grained Class Label Markup of Search Queries
[Details] [PDF]
Joseph Reisinger and Marius Pasca
In Proceedings of The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011), 1200-1209, June 2011.
Extending Bayesian Logic Programs for Plan Recognition and Machine Reading
[Details] [PDF] [Slides (PPT)]
Sindhu V. Raghavan
Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin, May 2011.
Joint Entity and Relation Extraction using Card-Pyramid Parsing
[Details] [PDF] [Slides (PPT)]
Rohit J. Kate and Raymond J. Mooney
In Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL-2010), 203--212, Uppsala, Sweden, July 2010.
Learning for Information Extraction: From Named Entity Recognition and Disambiguation To Relation Extraction
[Details] [PDF]
Razvan Constantin Bunescu
PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2007. 150 pages. Also as Technical Report AI07-345, Artificial Intelligence Lab, University of Texas at Austin, August 2007.
Learning to Extract Relations from the Web using Minimal Supervision
[Details] [PDF]
Razvan C. Bunescu and Raymond J. Mooney
In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL'07), Prague, Czech Republic, June 2007.
Extracting Relations from Text: From Word Sequences to Dependency Paths
[Details] [PDF]
Razvan C. Bunescu and Raymond J. Mooney
In A. Kao and S. Poteet, editors, Natural Language Processing and Text Mining, 29-44, Berlin, 2007. Springer Verlag.
Statistical Relational Learning for Natural Language Information Extraction
[Details] [PDF]
Razvan Bunescu and Raymond J. Mooney
In L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning, 535-552, Cambridge, MA, 2007. MIT Press.
Learnable Similarity Functions and Their Application to Record Linkage and Clustering
[Details] [PDF]
Mikhail Bilenko
PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2006. 136 pages.
Integrating Co-occurrence Statistics with Information Extraction for Robust Retrieval of Protein Interactions from Medline
[Details] [PDF]
Razvan Bunescu, Raymond Mooney, Arun Ramani and Edward Marcotte
In Proceedings of the HLT-NAACL Workshop on Linking Natural Language Processing and Biology (BioNLP'06), 49-56, New York, NY, June 2006.
Using Encyclopedic Knowledge for Named Entity Disambiguation
[Details] [PDF]
Razvan Bunescu and Marius Pasca
In Proceesings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), 9-16, Trento, Italy, 2006.
Subsequence Kernels for Relation Extraction
[Details] [PDF]
Razvan Bunescu and Raymond J. Mooney
In Y. Weiss, B. Schoelkopf, J. Platt, editors, Advances in Neural Information Processing Systems, Vol. 18: Proceedings of the 2005 Conference (NIPS), 2006.
A Shortest Path Dependency Kernel for Relation Extraction
[Details] [PDF]
R. C. Bunescu, and Raymond J. Mooney
In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP-05), 724-731, Vancouver, BC, October 2005.
Consolidating the Set of Known Human Protein-Protein Interactions in Preparation for Large-Scale Mapping of the Human Interactome
[Details] [PDF]
A.K. Ramani, R.C. Bunescu, Raymond J. Mooney and E.M. Marcotte
Genome Biology, 6(5):r40, 2005.
Mining Knowledge from Text Using Information Extraction
[Details] [PDF]
Raymond J. Mooney and R. Bunescu
SIGKDD Explorations (special issue on Text Mining and Natural Language Processing), 7(1):3-10, 2005.
Using Biomedical Literature Mining to Consolidate the Set of Known Human Protein-Protein Interactions
[Details] [PDF]
A. Ramani, E. Marcotte, R. Bunescu and Raymond J. Mooney
In Proceedings of the ISMB/ACL-05 Workshop of the BioLINK SIG: Linking Literature, Information and Knowledge for Biology, Detroit, MI, June 2005.
Learning for Collective Information Extraction
[Details] [PDF]
Razvan C. Bunescu
Technical Report TR-05-02, Department of Computer Sciences, University of Texas at Austin, October 2005. Ph.D. proposal.
Comparative Experiments on Learning Information Extractors for Proteins and their Interactions
[Details] [PDF]
Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Edward M. Marcotte, Raymond J. Mooney, Arun Kumar Ramani, and Yuk Wah Wong
Artificial Intelligence in Medicine (special issue on Summarization and Information Extraction from Medical Documents)(2):139-155, 2005.
Text Mining with Information Extraction
[Details] [PDF]
Un Yong Nahm
PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2004. 217 pages. Also appears as Technical Report UT-AI-TR-04-311.
Collective Information Extraction with Relational Markov Networks
[Details] [PDF]
Razvan Bunescu and Raymond J. Mooney
In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), 439-446, Barcelona, Spain, July 2004.
Using Soft-Matching Mined Rules to Improve Information Extraction
[Details] [PDF]
Un Yong Nahm and Raymond J. Mooney
In Proceedings of the AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM-2004), 27-32, San Jose, CA, July 2004.
Relational Markov Networks for Collective Information Extraction
[Details] [PDF]
Razvan Bunescu and Raymond J. Mooney
In Proceedings of the ICML-04 Workshop on Statistical Relational Learning and its Connections to Other Fields, Banff, Alberta, July 2004.
Learning to Extract Proteins and their Interactions from Medline Abstracts
[Details] [PDF]
Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Raymond J. Mooney, Yuk Wah Wong, Edward M. Marcotte, and Arun Kumar Ramani
In Proceedings of the ICML-03 Workshop on Machine Learning in Bioinformatics, 46-53, Washington, DC, August 2003.
Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction
[Details] [PDF]
Mary Elaine Califf and Raymond J. Mooney
Journal of Machine Learning Research:177-210, 2003.
Property-Based Feature Engineering and Selection
[Details] [PDF]
Noppadon Kamolvilassatian
Masters Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, December 2002. 85 pages.
Extracting Gene and Protein Names from Biomedical Abstracts
[Details] [PDF]
Razvan Bunescu, Ruifang Ge, Raymond J. Mooney, Edward Marcotte, and Arun Kumar Ramani
March 2002. Unpublished Technical Note.
ELIXIR: A Library for Writing Wrappers in Java
[Details] [PDF]
Edward Wild
December 2001. Undergraduate Honor Thesis, Department of Computer Sciences, University of Texas at Austin.
A Mutually Beneficial Integration of Data Mining and Information Extraction
[Details] [PDF]
Un Yong Nahm and Raymond J. Mooney
In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00), 627-632, Austin, TX, July 2000.
Relational Learning of Pattern-Match Rules for Information Extraction
[Details] [PDF]
Mary Elaine Califf and Raymond J. Mooney
In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), 328-334, Orlando, FL, July 1999.
Active Learning for Natural Language Parsing and Information Extraction
[Details] [PDF]
Cynthia A. Thompson, Mary Elaine Califf and Raymond J. Mooney
In Proceedings of the Sixteenth International Conference on Machine Learning (ICML-99), 406-414, Bled, Slovenia, June 1999.
Relational Learning Techniques for Natural Language Information Extraction
[Details] [PDF]
Mary Elaine Califf
PhD Thesis, Department of Computer Sciences, University of Texas, Austin, TX, August 1998. 142 pages. Also appears as Artificial Intelligence Laboratory Technical Report AI 98-276.
Relational Learning of Pattern-Match Rules for Information Extraction
[Details] [PDF]
Mary Elaine Califf and Raymond J. Mooney
In Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, 6-11, Standford, CA, March 1998.
Relational Learning Techniques for Natural Language Information Extraction
[Details] [PDF]
Mary Elaine Califf
1997. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
Applying ILP-based Techniques to Natural Language Information Extraction: An Experiment in Relational Learning
[Details] [PDF]
Mary Elaine Califf and Raymond J. Mooney
In Workshop Notes of the IJCAI-97 Workshop on Frontiers of Inductive Logic Programming, 7--11, Nagoya, Japan, August 1997.
Relational Learning of Pattern-Match Rules for Information Extraction
[Details] [PDF]
Mary Elaine Califf and Raymond J. Mooney
In Proceedings of the ACL Workshop on Natural Language Learning, 9-15, Madrid, Spain, July 1997.