- Mining Soft-Matching Association Rules
Un Yong Nahm and Raymond J. Mooney
Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM-2002) (short paper), pp. 681-683, McLean, VA, November 2002.
Paper ID: 117
Category: Text Data Mining
Variation and noise in database entries can prevent data mining algorithms, such as association rule mining, from discovering important regularities. In particular, textual fields can exhibit variation due to typographical errors, mispellings, abbreviations, etc.. By allowing partial or "soft matching" of items based on a similarity metric such as edit-distance or cosine similarity, additional important patterns can be detected. This paper introduces an algorithm, SoftApriori that discovers soft-matching association rules given a user-supplied similarity metric for each field. Experimental results on several "noisy" datasets extracted from text demonstrate that SoftApriori discovers additional relationships that more accurately reflect regularities in the data.

mooney@cs.utexas.edu