UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
admin
A Mutually Beneficial Integration of Data Mining and Information Extraction (2000)
Un Yong Nahm
and
Raymond J. Mooney
Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DiscoTEX, that combines IE and data mining methodologies to perform text mining as well as improve the performance of the underlying extraction system. Rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of IE. Encouraging results are presented on applying these techniques to a corpus of computer job postings from an Internet newsgroup.
View:
PDF
,
PS
Citation:
In
Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00)
, 627-632, Austin, TX, July 2000.
Bibtex:
@InProceedings{nahm:aaai00, title={A Mutually Beneficial Integration of Data Mining and Information Extraction}, author={Un Yong Nahm and Raymond J. Mooney}, booktitle={Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00)}, month={July}, address={Austin, TX}, key={DiscoTEX, KDD, IE}, pages={627-632}, url="http://www.cs.utexas.edu/users/ai-lab/pub-view.php?PubID=51415", year={2000} }
People
Raymond J. Mooney
Professor
mooney@cs.utexas.edu
Un Yong Nahm
Alumni (Alumni)
pebronia@acm.org
Areas of Interest
Information Extraction
Natural Language Learning
Text Data Mining
Machine Learning
Labs
Machine Learning