AI Lab Areas - Text Data Mining

Text Data Mining

Text data mining concerns the application of data mining (knowledge discovery in databases, KDD) to unstructured textual data. Our work focuses on using information extraction to first extract a structured database from a corpus of natural language texts and then discovering patterns in the resulting database using traditional KDD tools. It also concerns record linkage, a form of data-cleaning that identifies equivalent but textually distinct items in the extracted data prior to mining. It is also related to our research on natural language learning. Our recent work has focused on text mining for bioinformatics.

This research was formerly supported by the National Science Foundation through grant IIS-0117308 from the "Information and Data Management" Program.

Subareas:

People

Bishal Barman	Formerly affiliated Ph.D. Student	bbarman [at] apple com
Shruti Bhosale	Formerly affiliated Masters Student	shruti [at] cs utexas edu
Joohyun Kim	Ph.D. Alumni	scimitar [at] cs utexas edu
Nazneen Rajani	Ph.D. Alumni	nrajani [at] cs utexas edu

Publications

[Expand to show all 31]

Improving Video Activity Recognition using Object Recognition and Text Mining	2012
Tanvi S. Motwani and Raymond J. Mooney, In Proceedings of the 20th European Conference on Artificial Intelligence (ECAI-2012), pp. 600--605, August 2012.
Review Quality Aware Collaborative Filtering	2012
Sindhu Raghavan, Suriya Ganasekar, and Joydeep Ghosh, In Sixth ACM Conference on Recommender Systems (RecSys 2012), pp. 123--130, September 2012.
Extending Bayesian Logic Programs for Plan Recognition and Machine Reading	2011
Sindhu V. Raghavan, Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin.
Spherical Topic Models	2010
Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond J. Mooney, In Proceedings of the 27th International Conference on Machine Learning (ICML 2010) 2010.
Spherical Topic Models	2009
Joseph Reisinger, Austin Waters, Bryan Silverthorn, and Raymond Mooney, In NIPS'09 workshop: Applications for Topic Models: Text and Beyond 2009.
Extracting Relations from Text: From Word Sequences to Dependency Paths	2007
Razvan C. Bunescu and Raymond J. Mooney, In Natural Language Processing and Text Mining, A. Kao and S. Poteet (Eds.), pp. 29-44, Berlin 2007. Springer Verlag.
Learning for Information Extraction: From Named Entity Recognition and Disambiguation To Relation Extraction	2007
Razvan Constantin Bunescu, PhD Thesis, Department of Computer Sciences, University of Texas at Austin. 150 pages. Also as Technical Report AI07-345, Artificial Intelligence Lab, University of Texas at Austin, August 2007.
Learning to Extract Relations from the Web using Minimal Supervision	2007
Razvan C. Bunescu and Raymond J. Mooney, In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL'07), Prague, Czech Republic, June 2007.
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping	2005
Mikhail Bilenko, Sugato Basu, and Mehran Sahami, In Proceedings of the 5th International Conference on Data Mining (ICDM-2005), pp. 58--65, Houston, TX, November 2005.
Alignments and String Similarity in Information Integration: A Random Field Approach	2005
Mikhail Bilenko and Raymond J. Mooney, In Proceedings of the 2005 Dagstuhl Seminar on Machine Learning for the Semantic Web, Dagstuhl, Germany, February 2005.
Learning for Collective Information Extraction	2005
Razvan C. Bunescu, Technical Report TR-05-02, Department of Computer Sciences, University of Texas at Austin. Ph.D. proposal.
Mining Knowledge from Text Using Information Extraction	2005
Raymond J. Mooney and R. Bunescu, SIGKDD Explorations (special issue on Text Mining and Natural Language Processing), Vol. 7, 1 (2005), pp. 3-10.
Learnable Similarity Functions and Their Applications to Clustering and Record Linkage	2004
Mikhail Bilenko, In Proceedings of the Ninth AAAI/SIGART Doctoral Consortium, pp. 981--982, San Jose, CA, July 2004.
Text Mining with Information Extraction	2004
Un Yong Nahm, PhD Thesis, Department of Computer Sciences, University of Texas at Austin. 217 pages. Also appears as Technical Report UT-AI-TR-04-311.
Using Soft-Matching Mined Rules to Improve Information Extraction	2004
Un Yong Nahm and Raymond J. Mooney, In Proceedings of the AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM-2004), pp. 27-32, San Jose, CA, July 2004.
Adaptive Duplicate Detection Using Learnable String Similarity Measures	2003
Mikhail Bilenko and Raymond J. Mooney, In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp. 39-48, Washington, DC, August 2003.
Adaptive Name-Matching in Information Integration	2003
Mikhail Bilenko, William W. Cohen, Stephen Fienberg, Raymond J. Mooney, and Pradeep Ravikumar, IEEE Intelligent Systems, Vol. 18, 5 (2003), pp. 16-23.
Employing Trainable String Similarity Metrics for Information Integration	2003
Mikhail Bilenko and Raymond J. Mooney, In Proceedings of the IJCAI-03 Workshop on Information Integration on the Web, pp. 67-72, Acapulco, Mexico, August 2003.
Learnable Similarity Functions and Their Applications to Record Linkage and Clustering	2003
Mikhail Bilenko, unpublished. Doctoral Dissertation Proposal, University of Texas at Austin.
On Evaluation and Training-Set Construction for Duplicate Detection	2003
Mikhail Bilenko and Raymond J. Mooney, In Proceedings of the KDD-03 Workshop on Data Cleaning, Record Linkage, and Object Consolidation, pp. 7-12, Washington, DC, August 2003.
Text Mining with Information Extraction	2003
Raymond J. Mooney and Un Yong Nahm, In Multilingualism and Electronic Language Management: Proceedings of the 4th International MIDP Colloquium, W. Daelemans and T. du Plessis and C. Snyman and L. Teck (Eds.), pp. 141-160, Bloemf...
Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases	2002
Mikhail Bilenko and Raymond J. Mooney, Technical Report AI 02-296, Artificial Intelligence Laboratory, University of Texas at Austin.
Mining Soft-Matching Association Rules	2002
Un Yong Nahm and Raymond J. Mooney, In Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM-2002), pp. 681-683, McLean, VA, November 2002.
Text Mining with Information Extraction	2002
Un Yong Nahm and Raymond J. Mooney, In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, pp. 60-67, Stanford, CA, March 2002.
Two Approaches to Handling Noisy Variation in Text Mining	2002
Un Yong Nahm, Mikhail Bilenko, and Raymond J. Mooney, In Papers from the Nineteenth International Conference on Machine Learning (ICML-2002) Workshop on Text Learning, pp. 18-27, Sydney, Australia, July 2002.
Evaluating the Novelty of Text-Mined Rules using Lexical Knowledge	2001
Sugato Basu, Raymond J. Mooney, Krupakar V. Pasupuleti, and Joydeep Ghosh, In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), pp. 233-239, San Francisco, CA 2001.
Mining Soft-Matching Rules from Textual Data	2001
Un Yong Nahm and Raymond J. Mooney, In Proceedings of the 18th International Joint Conference on Artificial Intelligence 2001.
Text Mining with Information Extraction	2001
Un Yong Nahm, unpublished. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
Using Lexical Knowlege to Evaluate the Novelty of Rules Mined from Text	2001
Sugato Basu, Raymond J. Mooney, Krupakar V. Pasupuleti, and Joydeep Ghosh, In Proceedings of NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, pp. 144--149, Pittsburg, PA, June 2001.
A Mutually Beneficial Integration of Data Mining and Information Extraction	2000
Un Yong Nahm and Raymond J. Mooney, In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00), pp. 627-632, Austin, TX, July 2000.
Using Information Extraction to Aid the Discovery of Prediction Rules from Text	2000
Un Yong Nahm and Raymond J. Mooney, In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining, pp. 51--58, Boston, MA, August 2000.

Labs

Machine Learning