Mining soft-matching association rules

  • Authors:
  • Un Yong Nahm;Raymond J. Mooney

  • Affiliations:
  • The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX

  • Venue:
  • Proceedings of the eleventh international conference on Information and knowledge management
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

Variation and noise in database entries can prevent data mining algorithms, such as association rule mining, from discovering important regularities. In particular, textual fields can exhibit variation due to typographical errors, mispellings, abbreviations, etc.. By allowing partial or "soft matching" of items based on a similarity metric such as edit-distance or cosine similarity, additional important patterns can be detected. This paper introduces an algorithm, SoftApriori that discovers soft-matching association rules given a user-supplied similarity metric for each field. Experimental results on several "noisy" datasets extracted from text demonstrate that SoftApriori discovers additional relationships that more accurately reflect regularities in the data.