Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: Information Extraction

Information Extraction (IE) is a shallow form of text understanding that extracts substrings about prespecified types of entities or relationships from documents and web pages. Our work has focused on machine learning methods that induce information extractors from manually labeled training examples. Our recent work has focussed on IE for bioinformatics.

The RISE web site is a useful general information resource on IE.

  1. Stacking With Auxiliary Features
    [Details] [PDF] [Slides (PDF)] [Poster]
    Nazneen Fatema Rajani and Raymond J. Mooney
    In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), 2634-2640, Melbourne, Australia, 2017.
  2. Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision
    [Details] [PDF] [Slides (PDF)]
    Nazneen Fatema Rajani
    November 2016. PhD proposal, Department of Computer Science, The University of Texas at Austin.
  3. Stacked Ensembles of Information Extractors for Knowledge-Base Population
    [Details] [PDF] [Slides (PPT)]
    Vidhoon Viswanathan and Nazneen Fatema Rajani and Yinon Bentor and Raymond J. Mooney
    In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL-15), 177-187, Beijing, China, July 2015.
  4. Knowledge Base Population using Stacked Ensembles of Information Extractors
    [Details] [PDF]
    Vidhoon Viswanathan
    Masters Thesis, Department of Computer Science, The University of Texas at Austin, May 2015.
  5. University of Texas at Austin KBP 2014 Slot Filling System: Bayesian Logic Programs for Textual Inference
    [Details] [PDF]
    Yinon Bentor and Vidhoon Viswanathan and Raymond Mooney
    In Proceedings of the Seventh Text Analysis Conference: Knowledge Base Population (TAC 2014), 2014.
  6. University of Texas at Austin KBP 2013 Slot Filling System: Bayesian Logic Programs for Textual Inference
    [Details] [PDF]
    Yinon Bentor and Amelia Harrison and Shruti Bhosale and Raymond Mooney
    In Proceedings of the Sixth Text Analysis Conference (TAC 2013), 2013.
  7. Online Inference-Rule Learning from Natural-Language Extractions
    [Details] [PDF] [Poster]
    Sindhu Raghavan and Raymond J. Mooney
    In Proceedings of the 3rd Statistical Relational AI (StaRAI-13) workshop at AAAI '13, July 2013.
  8. Bayesian Logic Programs for Plan Recognition and Machine Reading
    [Details] [PDF] [Slides (PPT)]
    Sindhu Raghavan
    PhD Thesis, Department of Computer Science, University of Texas at Austin, December 2012. 170.
  9. Learning to "Read Between the Lines" using Bayesian Logic Programs
    [Details] [PDF] [Slides (PPT)]
    Sindhu Raghavan and Raymond J. Mooney and Hyeonseo Ku
    In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL-2012), 349--358, July 2012.
  10. Fine-Grained Class Label Markup of Search Queries
    [Details] [PDF]
    Joseph Reisinger and Marius Pasca
    In Proceedings of The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011), 1200-1209, June 2011.
  11. Extending Bayesian Logic Programs for Plan Recognition and Machine Reading
    [Details] [PDF] [Slides (PPT)]
    Sindhu V. Raghavan
    Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin, May 2011.
  12. Joint Entity and Relation Extraction using Card-Pyramid Parsing
    [Details] [PDF] [Slides (PPT)]
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the Fourteenth Conference on Computational Natural Language Learning (CoNLL-2010), 203--212, Uppsala, Sweden, July 2010.
  13. Learning for Information Extraction: From Named Entity Recognition and Disambiguation To Relation Extraction
    [Details] [PDF]
    Razvan Constantin Bunescu
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2007. 150 pages. Also as Technical Report AI07-345, Artificial Intelligence Lab, University of Texas at Austin, August 2007.
  14. Learning to Extract Relations from the Web using Minimal Supervision
    [Details] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL'07), Prague, Czech Republic, June 2007.
  15. Extracting Relations from Text: From Word Sequences to Dependency Paths
    [Details] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    In A. Kao and S. Poteet, editors, Natural Language Processing and Text Mining, 29-44, Berlin, 2007. Springer Verlag.
  16. Statistical Relational Learning for Natural Language Information Extraction
    [Details] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    In L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning, 535-552, Cambridge, MA, 2007. MIT Press.
  17. Learnable Similarity Functions and Their Application to Record Linkage and Clustering
    [Details] [PDF]
    Mikhail Bilenko
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2006. 136 pages.
  18. Integrating Co-occurrence Statistics with Information Extraction for Robust Retrieval of Protein Interactions from Medline
    [Details] [PDF]
    Razvan Bunescu, Raymond Mooney, Arun Ramani and Edward Marcotte
    In Proceedings of the HLT-NAACL Workshop on Linking Natural Language Processing and Biology (BioNLP'06), 49-56, New York, NY, June 2006.
  19. Using Encyclopedic Knowledge for Named Entity Disambiguation
    [Details] [PDF]
    Razvan Bunescu and Marius Pasca
    In Proceesings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), 9-16, Trento, Italy, 2006.
  20. Subsequence Kernels for Relation Extraction
    [Details] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    In Y. Weiss, B. Schoelkopf, J. Platt, editors, Advances in Neural Information Processing Systems, Vol. 18: Proceedings of the 2005 Conference (NIPS), 2006.
  21. A Shortest Path Dependency Kernel for Relation Extraction
    [Details] [PDF]
    R. C. Bunescu, and Raymond J. Mooney
    In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP-05), 724-731, Vancouver, BC, October 2005.
  22. Consolidating the Set of Known Human Protein-Protein Interactions in Preparation for Large-Scale Mapping of the Human Interactome
    [Details] [PDF]
    A.K. Ramani, R.C. Bunescu, Raymond J. Mooney and E.M. Marcotte
    Genome Biology, 6(5):r40, 2005.
  23. Mining Knowledge from Text Using Information Extraction
    [Details] [PDF]
    Raymond J. Mooney and R. Bunescu
    SIGKDD Explorations (special issue on Text Mining and Natural Language Processing), 7(1):3-10, 2005.
  24. Using Biomedical Literature Mining to Consolidate the Set of Known Human Protein-Protein Interactions
    [Details] [PDF]
    A. Ramani, E. Marcotte, R. Bunescu and Raymond J. Mooney
    In Proceedings of the ISMB/ACL-05 Workshop of the BioLINK SIG: Linking Literature, Information and Knowledge for Biology, Detroit, MI, June 2005.
  25. Learning for Collective Information Extraction
    [Details] [PDF]
    Razvan C. Bunescu
    Technical Report TR-05-02, Department of Computer Sciences, University of Texas at Austin, October 2005. Ph.D. proposal.
  26. Comparative Experiments on Learning Information Extractors for Proteins and their Interactions
    [Details] [PDF]
    Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Edward M. Marcotte, Raymond J. Mooney, Arun Kumar Ramani, and Yuk Wah Wong
    Artificial Intelligence in Medicine (special issue on Summarization and Information Extraction from Medical Documents)(2):139-155, 2005.
  27. Text Mining with Information Extraction
    [Details] [PDF]
    Un Yong Nahm
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2004. 217 pages. Also appears as Technical Report UT-AI-TR-04-311.
  28. Collective Information Extraction with Relational Markov Networks
    [Details] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), 439-446, Barcelona, Spain, July 2004.
  29. Using Soft-Matching Mined Rules to Improve Information Extraction
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM-2004), 27-32, San Jose, CA, July 2004.
  30. Relational Markov Networks for Collective Information Extraction
    [Details] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    In Proceedings of the ICML-04 Workshop on Statistical Relational Learning and its Connections to Other Fields, Banff, Alberta, July 2004.
  31. Learning to Extract Proteins and their Interactions from Medline Abstracts
    [Details] [PDF]
    Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Raymond J. Mooney, Yuk Wah Wong, Edward M. Marcotte, and Arun Kumar Ramani
    In Proceedings of the ICML-03 Workshop on Machine Learning in Bioinformatics, 46-53, Washington, DC, August 2003.
  32. Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    Journal of Machine Learning Research:177-210, 2003.
  33. Property-Based Feature Engineering and Selection
    [Details] [PDF]
    Noppadon Kamolvilassatian
    Masters Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, December 2002. 85 pages.
  34. Extracting Gene and Protein Names from Biomedical Abstracts
    [Details] [PDF]
    Razvan Bunescu, Ruifang Ge, Raymond J. Mooney, Edward Marcotte, and Arun Kumar Ramani
    March 2002. Unpublished Technical Note.
  35. ELIXIR: A Library for Writing Wrappers in Java
    [Details] [PDF]
    Edward Wild
    December 2001. Undergraduate Honor Thesis, Department of Computer Sciences, University of Texas at Austin.
  36. A Mutually Beneficial Integration of Data Mining and Information Extraction
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00), 627-632, Austin, TX, July 2000.
  37. Relational Learning of Pattern-Match Rules for Information Extraction
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), 328-334, Orlando, FL, July 1999.
  38. Active Learning for Natural Language Parsing and Information Extraction
    [Details] [PDF]
    Cynthia A. Thompson, Mary Elaine Califf and Raymond J. Mooney
    In Proceedings of the Sixteenth International Conference on Machine Learning (ICML-99), 406-414, Bled, Slovenia, June 1999.
  39. Relational Learning Techniques for Natural Language Information Extraction
    [Details] [PDF]
    Mary Elaine Califf
    PhD Thesis, Department of Computer Sciences, University of Texas, Austin, TX, August 1998. 142 pages. Also appears as Artificial Intelligence Laboratory Technical Report AI 98-276.
  40. Relational Learning of Pattern-Match Rules for Information Extraction
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    In Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, 6-11, Standford, CA, March 1998.
  41. Relational Learning Techniques for Natural Language Information Extraction
    [Details] [PDF]
    Mary Elaine Califf
    1997. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
  42. Applying ILP-based Techniques to Natural Language Information Extraction: An Experiment in Relational Learning
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    In Workshop Notes of the IJCAI-97 Workshop on Frontiers of Inductive Logic Programming, 7--11, Nagoya, Japan, August 1997.
  43. Relational Learning of Pattern-Match Rules for Information Extraction
    [Details] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    In Proceedings of the ACL Workshop on Natural Language Learning, 9-15, Madrid, Spain, July 1997.