UT ML Group: Information Extraction

Information Extraction (IE) is a shallow form of text understanding that extracts substrings about prespecified types of entities or relationships from documents and web pages. Our work has focused on machine learning methods that induce information extractors from manually labeled training examples. Our recent work has focussed on IE for bioinformatics.

The RISE web site is a useful general information resource on IE.

Publications

  1. Learning for Information Extraction: From Named Entity Recognition and Disambiguation To Relation Extraction [Abstract] [PDF]
    Razvan Constantin Bunescu
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 2007.
    150 pages.
    Also appears as Technical Report AI07-345, Artificial Intelligence Lab, University of Texas at Austin, August 2007.

  2. Learning to Extract Relations from the Web using Minimal Supervision [Abstract] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL) , Prague, Czech Republic, pp. 576--583, June 2007.

  3. Extracting Relations from Text: From Word Sequences to Dependency Paths [Abstract] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    Text Mining and Natural Language Processing, Anne Kao and Steve Poteet (eds.), pp. 29-44, Springer, 2007.

  4. Learnable Similarity Functions and Their Application to Record Linkage and Clustering [Abstract] [PDF]
    Mikhail Bilenko
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 2006.
    136 pages.

  5. Integrating Co-occurrence Statistics with Information Extraction for Robust Retrieval of Protein Interactions from Medline [Abstract] [PDF]
    Razvan Bunescu, Raymond Mooney, Arun Ramani and Edward Marcotte
    In Proceedings of the HLT-NAACL Workshop on Linking Natural Language Processing and Biology: Towards deeper biological literature analysis (BioNLP-2006), pp. 49-56, New York City, NY, June 2006.

  6. A Shortest Path Dependency Kernel for Relation Extraction [Abstract] [PDF]
    Bunescu, R. C., and Mooney, R.J.
    Appears in Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, B.C., pp. 724--731, October 2005.

  7. Consolidating the Set of Known Human Protein-Protein Interactions in Preparation for Large-Scale Mapping of the Human Interactome [Abstract] [PDF]
    Ramani, A.K., Bunescu, R.C., Mooney, R.J. and Marcotte, E.M.
    Genome Biology, 6, 5, r40(2005).

  8. Mining Knowledge from Text Using Information Extraction [Abstract] [PDF]
    Mooney, R. J. and Bunescu, R.
    SIGKDD Explorations (special issue on Text Mining and Natural Language Processing), 7, 1 (2005), pp. 3-10.

  9. Subsequence Kernels for Relation Extraction [Abstract] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    Advances in Neural Information Processing Systems, Vol. 18: Proceedings of the 2005 Conference (NIPS), Y. Weiss, B. Schoelkopf, J. Platt (Eds.), MIT Press, 2006.

  10. Statistical Relational Learning for Natural Language Information Extraction [Abstract] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    Introduction to Statistical Relational Learning, Getoor, L. and Taskar, B. (Eds.), pp. 535-552, MIT Press, Cambridge, MA, 2007.

  11. Using Biomedical Literature Mining to Consolidate the Set of Known Human Protein-Protein Interactions [Abstract] [PDF]
    Ramani, A., Marcotte E., Bunescu, R., and Mooney, R.J.
    Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 46--53, Detroit, MI, June 2005.

  12. Text Mining with Information Extraction [Abstract] [PDF]
    Raymond J. Mooney and Un Yong Nahm
    Multilingualism and Electronic Language Management: Proceedings of the 4th International MIDP Colloquium, 22-23 September 2003, Bloemfontein, South Africa, Daelemans, W., du Plessis, T., Snyman, C. and Teck, L. (Eds.), pp. 141-160, Van Schaik Pub., South Africa, 2005.

  13. Learning for Collective Information Extraction [Abstract] [PDF]
    Razvan C. Bunescu
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, October 2004.
    45 pages.
    Also appears as Technical Report TR-05-02, Artificial Intelligence Lab, University of Texas at Austin, February 2005.

  14. Text Mining with Information Extraction [Abstract] [PDF]
    Un Yong Nahm
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 2004.
    217 pages
    Also appears as Technical Report UT-AI-TR-04-311, Artificial Intelligence Lab, University of Texas at Austin, October 2004.

  15. Collective Information Extraction with Relational Markov Networks [Abstract] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-2004), pp. 439-446, Barcelona, Spain, July 2004.

  16. Using Soft-Matching Mined Rules to Improve Information Extraction [Abstract] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    Proceedings of the AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM-2004), pp. 27-32, San Jose, CA, July 2004.

  17. Relational Markov Networks for Collective Information Extraction [Abstract] [PDF]
    Razvan Bunescu and Raymond J. Mooney
    Proceedings of the ICML-2004 Workshop on Statistical Relational Learning and its Connections to Other Fields (SRL-2004), Banff, Canada, July 2004.

  18. Comparative Experiments on Learning Information Extractors for Proteins and their Interactions [Abstract] [PDF]
    Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Edward M. Marcotte, Raymond J. Mooney, Arun Kumar Ramani, and Yuk Wah Wong
    Artificial Intelligence in Medicine (Special Issue on Summarization and Information Extraction from Medical Documents), 33, 2 (2005), pp. 139-155.

  19. Learning to Extract Proteins and their Interactions from Medline Abstracts [Abstract] [PDF]
    Razvan Bunescu, Ruifang Ge, Rohit J. Kate, Raymond J. Mooney, Yuk Wah Wong, Edward M. Marcotte, and Arun Kumar Ramani
    Proceedings of the ICML-2003 Workshop on Machine Learning in Bioinformatics, pp.46-53, Washington DC, August 2003.

  20. Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction [Abstract] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    Journal of Machine Learning Research, 4, (2003), pp. 177-210.

  21. Property-Based Feature Engineering and Selection [Abstract] [PDF]
    Noppadon Kamolvilassatian
    M.A. Thesis, Department of Computer Sciences, University of Texas at Austin, December 2002.
    85 pages

  22. Extracting Gene and Protein Names from Biomedical Abstracts [Abstract] [PDF]
    Razvan Bunescu, Ruifang Ge, Raymond J. Mooney, Edward Marcotte, and Arun Kumar Ramani
    Unpublished Technical Note, March 2002.

  23. ELIXIR: A Library for Writing Wrappers in Java [Abstract] [PDF]
    Edward Wild
    Undergraduate Honor Thesis, Department of Computer Sciences, University of Texas at Austin, December 2001.

  24. A Mutually Beneficial Integration of Data Mining and Information Extraction [Abstract] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), Austin, TX, pp. 627-632, July 2000.

  25. Relational Learning of Pattern-Match Rules for Information Extraction [Abstract] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99), Orlando, FL, pp. 328-334, July 1999.

  26. Active Learning for Natural Language Parsing and Information Extraction [Abstract] [PDF]
    Cynthia A. Thompson, Mary Elaine Califf and Raymond J. Mooney
    Nominated for Best Paper Award
    Proceedings of the Sixteenth International Machine Learning Conference (ICML-99) , Bled, Slovenia, pp. 406-414, June 1999.

  27. Relational Learning Techniques for Natural Language Information Extraction [Abstract] [PDF]
    Mary Elaine Califf
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 1998.
    142 pages.
    Also appears as Technical Report AI 98-276, Artificial Intelligence Lab, University of Texas at Austin.

  28. Relational Learning of Pattern-Match Rules for Information Extraction [Abstract] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pp. 6-11, Standford, CA, March 1998.

  29. Relational Learning Techniques for Natural Language Information Extraction [Abstract] [PDF]
    Mary Elaine Califf
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, 1997.
    27 pages

  30. Applying ILP-based Techniques to Natural Language Information Extraction: An Experiment in Relational Learning [Abstract] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    Workshop Notes of the IJCAI-97 Workshop on Frontiers of Inductive Logic Programming, pp. 7-11, Nagoya, Japan, August 1997.

  31. Relational Learning of Pattern-Match Rules for Information Extraction [Abstract] [PDF]
    Mary Elaine Califf and Raymond J. Mooney
    Proceedings of the ACL Workshop on Natural Language Learning, pp. 9-15, Madrid, Spain, July 1997.


mooney@cs.utexas.edu