Using Soft-Matching Mined Rules to Improve Information Extraction (2004)
By discovering predictive relationships between different pieces of extracted data, data-mining algorithms can be used to improve the accuracy of information extraction. However, textual variation due to typos, abbreviations, and other sources can prevent the productive discovery and utilization of hard-matching rules. Recent methods for inducing soft-matching rules from extracted data can more effectively find and exploit predictive relationships in textual data. This paper presents techniques for using mined soft-matching association rules to increase the accuracy of information extraction. Experimental results on a corpus of computer-science job postings demonstrate that soft-matching rules improve information extraction more effectively than hard-matching rules.
In Proceedings of the AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM-2004), pp. 27-32, San Jose, CA, July 2004.

Raymond J. Mooney Faculty mooney [at] cs utexas edu
Un Yong Nahm Ph.D. Alumni pebronia [at] acm org