Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules

  • Authors:
  • S. D. Lee;David W. Cheung;Ben Kao

  • Affiliations:
  • Department of Computer Science, The University of Hong Kong, Hong Kong. sdlee@cs.hku.hk;Department of Computer Science, The University of Hong Kong, Hong Kong. dcheung@cs.hku.hk;Department of Computer Science, The University of Hong Kong, Hong Kong. kao@cs.hku.hk

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

By nature, sampling is an appealing technique for datamining, because approximate solutions in most cases may alreadybe of great satisfaction to the need of the users. We attempt touse sampling techniques to address the problem of maintainingdiscovered association rules. Some studies have been done on theproblem of maintaining the discovered association rules whenupdates are made to the database. All proposed methods mustexamine not only the changed part but also the unchanged part inthe original database, which is very large, and hence take muchtime. Worse yet, if the updates on the rules are performedfrequently on the database but the underlying rule set has notchanged much, then the effort could be mostly wasted. In thispaper, we devise an algorithm which employs sampling techniquesto estimate the difference between the association rules in adatabase before and after the database is updated. The estimateddifference can be used to determine whether we should update themined association rules or not. If the estimated difference issmall, then the rules in the original database is still a goodapproximation to those in the updated database. Hence, we do nothave to spend the resources to update the rules. We canaccumulate more updates before actually updating the rules,thereby avoiding the overheads of updating the rules toofrequently. Experimental results show that our algorithm is veryefficient and highly accurate.