Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: Text Data Mining

Text data mining concerns the application of data mining (knowledge discovery in databases, KDD) to unstructured textual data. Our work focuses on using information extraction to first extract a structured database from a corpus of natural language texts and then discovering patterns in the resulting database using traditional KDD tools. It also concerns record linkage, a form of data-cleaning that identifies equivalent but textually distinct items in the extracted data prior to mining. It is also related to our research on natural language learning. Our recent work has focused on text mining for bioinformatics.

This research was formerly supported by the National Science Foundation through grant IIS-0117308 from the "Information and Data Management" Program.

  1. Review Quality Aware Collaborative Filtering
    [Details] [PDF]
    Sindhu Raghavan and Suriya Ganasekar and Joydeep Ghosh
    In Sixth ACM Conference on Recommender Systems (RecSys 2012), 123--130, September 2012.
  2. Improving Video Activity Recognition using Object Recognition and Text Mining
    [Details] [PDF] [Slides]
    Tanvi S. Motwani and Raymond J. Mooney
    In Proceedings of the 20th European Conference on Artificial Intelligence (ECAI-2012), 600--605, August 2012.
  3. Extending Bayesian Logic Programs for Plan Recognition and Machine Reading
    [Details] [PDF] [Slides]
    Sindhu V. Raghavan
    Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin, May 2011.
  4. Spherical Topic Models
    [Details] [PDF] [Slides]
    Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J. Mooney
    In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), 2010.
  5. Spherical Topic Models
    [Details] [PDF]
    Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond Mooney
    In NIPS'09 workshop: Applications for Topic Models: Text and Beyond, 2009.
  6. Learning for Information Extraction: From Named Entity Recognition and Disambiguation To Relation Extraction
    [Details] [PDF]
    Razvan Constantin Bunescu
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2007. 150 pages. Also as Technical Report AI07-345, Artificial Intelligence Lab, University of Texas at Austin, August 2007.
  7. Learning to Extract Relations from the Web using Minimal Supervision
    [Details] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL'07), Prague, Czech Republic, June 2007.
  8. Extracting Relations from Text: From Word Sequences to Dependency Paths
    [Details] [PDF]
    Razvan C. Bunescu and Raymond J. Mooney
    In A. Kao and S. Poteet, editors, Natural Language Processing and Text Mining, 29-44, Berlin, 2007. Springer Verlag.
  9. Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
    [Details] [PDF]
    Mikhail Bilenko, Sugato Basu, and Mehran Sahami
    In Proceedings of the 5th International Conference on Data Mining (ICDM-2005), 58--65, Houston, TX, November 2005.
  10. Alignments and String Similarity in Information Integration: A Random Field Approach
    [Details] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    In Proceedings of the 2005 Dagstuhl Seminar on Machine Learning for the Semantic Web, Dagstuhl, Germany, February 2005.
  11. Mining Knowledge from Text Using Information Extraction
    [Details] [PDF]
    Raymond J. Mooney and R. Bunescu
    SIGKDD Explorations (special issue on Text Mining and Natural Language Processing), 7(1):3-10, 2005.
  12. Learning for Collective Information Extraction
    [Details] [PDF]
    Razvan C. Bunescu
    Technical Report TR-05-02, Department of Computer Sciences, University of Texas at Austin, October 2005. Ph.D. proposal.
  13. Text Mining with Information Extraction
    [Details] [PDF]
    Un Yong Nahm
    PhD Thesis, Department of Computer Sciences, University of Texas at Austin, Austin, TX, August 2004. 217 pages. Also appears as Technical Report UT-AI-TR-04-311.
  14. Using Soft-Matching Mined Rules to Improve Information Extraction
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM-2004), 27-32, San Jose, CA, July 2004.
  15. Learnable Similarity Functions and Their Applications to Clustering and Record Linkage
    [Details] [PDF]
    Mikhail Bilenko
    In Proceedings of the Ninth AAAI/SIGART Doctoral Consortium, 981--982, San Jose, CA, July 2004.
  16. Text Mining with Information Extraction
    [Details] [PDF]
    Raymond J. Mooney and Un Yong Nahm
    In W. Daelemans and T. du Plessis and C. Snyman and L. Teck, editors, Multilingualism and Electronic Language Management: Proceedings of the 4th International MIDP Colloquium, 141-160, Bloemfontein, South Africa, September 2003. Van Schaik: South Africa.
  17. Learnable Similarity Functions and Their Applications to Record Linkage and Clustering
    [Details] [PDF]
    Mikhail Bilenko
    2003. Doctoral Dissertation Proposal, University of Texas at Austin.
  18. Adaptive Name-Matching in Information Integration
    [Details] [PDF]
    Mikhail Bilenko, William W. Cohen, Stephen Fienberg, Raymond J. Mooney, and Pradeep Ravikumar
    IEEE Intelligent Systems, 18(5):16-23, 2003.
  19. On Evaluation and Training-Set Construction for Duplicate Detection
    [Details] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    In Proceedings of the KDD-03 Workshop on Data Cleaning, Record Linkage, and Object Consolidation, 7-12, Washington, DC, August 2003.
  20. Adaptive Duplicate Detection Using Learnable String Similarity Measures
    [Details] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), 39-48, Washington, DC, August 2003.
  21. Employing Trainable String Similarity Metrics for Information Integration
    [Details] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    In Proceedings of the IJCAI-03 Workshop on Information Integration on the Web, 67-72, Acapulco, Mexico, August 2003.
  22. Mining Soft-Matching Association Rules
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM-2002), 681-683, McLean, VA, November 2002.
  23. Two Approaches to Handling Noisy Variation in Text Mining
    [Details] [PDF]
    Un Yong Nahm, Mikhail Bilenko, and Raymond J. Mooney
    In Papers from the Nineteenth International Conference on Machine Learning (ICML-2002) Workshop on Text Learning, 18-27, Sydney, Australia, July 2002.
  24. Text Mining with Information Extraction
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, 60-67, Stanford, CA, March 2002.
  25. Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases
    [Details] [PDF]
    Mikhail Bilenko and Raymond J. Mooney
    Technical Report AI 02-296, Artificial Intelligence Laboratory, University of Texas at Austin, Austin, TX, February 2002.
  26. Evaluating the Novelty of Text-Mined Rules using Lexical Knowledge
    [Details] [PDF]
    Sugato Basu, Raymond J. Mooney, Krupakar V. Pasupuleti, and Joydeep Ghosh
    In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), 233-239, San Francisco, CA, 2001.
  27. Mining Soft-Matching Rules from Textual Data
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the 18th International Joint Conference on Artificial Intelligence, 2001.
  28. Using Lexical Knowlege to Evaluate the Novelty of Rules Mined from Text
    [Details] [PDF]
    Sugato Basu, Raymond J. Mooney, Krupakar V. Pasupuleti, and Joydeep Ghosh
    In Proceedings of NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, 144--149, Pittsburg, PA, June 2001.
  29. Text Mining with Information Extraction
    [Details] [PDF]
    Un Yong Nahm
    February 2001. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
  30. Using Information Extraction to Aid the Discovery of Prediction Rules from Text
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining, 51--58, Boston, MA, August 2000.
  31. A Mutually Beneficial Integration of Data Mining and Information Extraction
    [Details] [PDF]
    Un Yong Nahm and Raymond J. Mooney
    In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00), 627-632, Austin, TX, July 2000.