- Using Information Extraction to Aid the Discovery of Prediction Rules from Text
Un Yong Nahm and Raymond J. Mooney
Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining, pp. 51-58, Boston, MA, August 2000
Paper ID: 101
Category: Text Data Mining
Text mining and Information Extraction(IE) are both topics of significant recent interest. Text mining concerns applying data mining, a.k.a. knowledge discovery from databases (KDD) techniques to unstructured text. Information extraction (IE) is a form of shallow text understanding that locates specific pieces of data in natural language documents, transforming unstructured text into a structured database. This paper describes a system called DiscoTEX, that combines IE and KDD methods to perform a text mining task, discovering prediction rules from natural-language corpora. An initial version of DiscoTEX is constructed by integrating an IE module based on Rapier and a rule-learning module, Ripper. We present encouraging results on applying these techniques to a corpus of computer job postings from an Internet newsgroup.

mooney@cs.utexas.edu