UT ML Group: Publication: Text Data Mining

Text data mining concerns the application of data mining (knowledge discovery in databases, KDD) to unstructured textual data. Our work focuses on using information extraction to first extract a structured database from a corpus of natural language texts and then discovering patterns in the resulting database using traditional KDD tools. It also concerns record linkage , a form of data-cleaning that identifies equivalent but textually distinct items in the extracted data prior to mining. It is also related to our research on natural language learning. Our recent work has focused on text mining for bioinformatics.

This research was formerly supported by the National Science Foundation through grant IIS-0117308 from the "Information and Data Management" Program.

Publications

  1. Spherical Topic Models [Abstract] [PDF]
    Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond Mooney
    To appear in NIPS'09 workshop: Applications for Topic Models: Text and Beyond

  2. Learning for Information Extraction: From Named Entity Recognition and Disambiguation To Relation Extraction [Abstract] [PDF]
    Razvan Constantin Bunescu
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 2007.
    150 pages.
    Also appears as Technical Report AI07-345, Artificial Intelligence Lab, University of Texas at Austin, August 2007.

  3. Learning to Extract Relations from the Web using Minimal Supervision [Abstract] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL) , Prague, Czech Republic, pp. 576--583, June 2007.

  4. Extracting Relations from Text: From Word Sequences to Dependency Paths [Abstract] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    Text Mining and Natural Language Processing, Anne Kao and Steve Poteet (eds.), pp. 29-44, Springer, 2007.

  5. Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping [Abstract] [PDF]
    Mikhail Bilenko, Sugato Basu, and Mehran Sahami
    Appears in Proceedings of the 5th International Conference on Data Mining (ICDM-2005), Houston, TX, pp. 58-65, November 2005.

  6. Alignments and String Similarity in Information Integration: A Random Field Approach [Abstract] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    Appears in Proceedings of the 2005 Dagstuhl Seminar on Machine Learning for the Semantic Web, Dagstuhl, Germany, February 2005.

  7. Mining Knowledge from Text Using Information Extraction [Abstract] [PDF]
    Mooney, R. J. and Bunescu, R.
    SIGKDD Explorations (special issue on Text Mining and Natural Language Processing), 7, 1 (2005), pp. 3-10.

  8. Learning for Collective Information Extraction [Abstract] [PDF]
    Razvan C. Bunescu
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, October 2004.
    45 pages.
    Also appears as Technical Report TR-05-02, Artificial Intelligence Lab, University of Texas at Austin, February 2005.

  9. Text Mining with Information Extraction [Abstract] [PDF]
    Un Yong Nahm
    Ph.D. Thesis, Department of Computer Sciences, University of Texas at Austin, August 2004.
    217 pages
    Also appears as Technical Report UT-AI-TR-04-311, Artificial Intelligence Lab, University of Texas at Austin, October 2004.

  10. Using Soft-Matching Mined Rules to Improve Information Extraction [Abstract] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    Proceedings of the AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM-2004), pp. 27-32, San Jose, CA, July 2004.

  11. Learnable Similarity Functions and Their Applications to Clustering and Record Linkage [Abstract] [PDF]
    Mikhail Bilenko
    Proceedings of the Ninth AAAI/SIGART Doctoral Consortium, pp. 981-982, San Jose, CA, July 2004.

  12. Text Mining with Information Extraction [Abstract] [PDF]
    Raymond J. Mooney and Un Yong Nahm
    Multilingualism and Electronic Language Management: Proceedings of the 4th International MIDP Colloquium, 22-23 September 2003, Bloemfontein, South Africa, Daelemans, W., du Plessis, T., Snyman, C. and Teck, L. (Eds.), pp. 141-160, Van Schaik Pub., South Africa, 2005.

  13. Learnable Similarity Functions and Their Applications to Record Linkage and Clustering [Abstract] [PDF]
    Mikhail Bilenko
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, October 2003.
    47 pages.
    Also appears as Technical Report UT-AI-TR-03-305, Artificial Intelligence Lab, University of Texas at Austin, December 2003.

  14. Adaptive Name-Matching in Information Integration [Abstract] [PDF]
    Mikhail Bilenko, William W. Cohen, Stephen Fienberg, Raymond J. Mooney, and Pradeep Ravikumar
    IEEE Intelligent Systems, 18(5), pp. 16-23, September/October 2003.

  15. On Evaluation and Training-Set Construction for Duplicate Detection [Abstract] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    Proceedings of the KDD-2003 Workshop on Data Cleaning, Record Linkage, and Object Consolidation, pp. 7-12, Washington DC, August 2003.

  16. Adaptive Duplicate Detection Using Learnable String Similarity Measures [Abstract] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp. 39-48, Washington DC, August 2003.

  17. Employing Trainable String Similarity Metrics for Information Integration [Abstract] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web, pp. 67-72, Acapulco, Mexico, August 2003.

  18. Mining Soft-Matching Association Rules [Abstract] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM-2002) (short paper), pp. 681-683, McLean, VA, November 2002.

  19. Two Approaches to Handling Noisy Variation in Text Mining [Abstract] [PDF]
    Un Yong Nahm, Mikhail Bilenko, and Raymond J. Mooney
    Proceedings of the ICML-2002 Workshop on Text Learning (TextML'2002), pp. 18-27, Sydney, Australia, July 2002.

  20. Text Mining with Information Extraction [Abstract] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, pp. 60-67, Stanford, CA, March 2002.

  21. Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases [Abstract] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    Technical Report AI 02-296, Artificial Intelligence Lab, University of Texas at Austin, February 2002.

  22. Evaluating the Novelty of Text-Mined Rules using Lexical Knowledge [Abstract] [PDF]
    Sugato Basu, Raymond J. Mooney, Krupakar V. Pasupuleti, and Joydeep Ghosh
    Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001) (short paper), pp. 233-238, San Francisco, CA, August 2001.

  23. Mining Soft-Matching Rules from Textual Data [Abstract] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), pp. 979 - 984, Seattle, WA, August 2001.

  24. Using Lexical Knowlege to Evaluate the Novelty of Rules Mined from Text [Abstract] [PDF]
    Sugato Basu, Raymond J. Mooney, Krupakar V. Pasupuleti, and Joydeep Ghosh
    Proceedings of NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, pp. 144-149, Pittsburg, PA, June 2001.

  25. Text Mining with Information Extraction [Abstract] [PDF]
    Un Yong Nahm
    Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin, February 2001.
    57 pages.

  26. Using Information Extraction to Aid the Discovery of Prediction Rules from Text [Abstract] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining, pp. 51-58, Boston, MA, August 2000

  27. A Mutually Beneficial Integration of Data Mining and Information Extraction [Abstract] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), Austin, TX, pp. 627-632, July 2000.


mooney@cs.utexas.edu