- A Mutually Beneficial Integration of Data Mining and Information Extraction
Un Yong Nahm and Raymond J. Mooney
Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-2000), Austin, TX, pp. 627-632, July 2000.
Paper ID: 100
Category: Information Extraction, Natural Language Learning, Text Data Mining
Text mining concerns applying data mining techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DiscoTEX, that combines IE and data mining methodologies to perform text mining as well as improve the performance of the underlying extraction system. Rules mined from a database extracted from a corpus of texts are used to predict additional information to extract from future documents, thereby improving the recall of IE. Encouraging results are presented on applying these techniques to a corpus of computer job postings from an Internet newsgroup.

mooney@cs.utexas.edu