Efficient Progressive Sampling for Association Rules

  • Authors:
  • Srinivasan Parthasarathy

  • Affiliations:
  • -

  • Venue:
  • ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In data mining, sampling has often been suggested as aneffective tool to reduce the size of the dataset operated atsome cost to accuracy. However, this loss to accuracy isoften difficult to measure and characterize since the exactnature of the learning curve (accuracy vs. sample size) isparameter and data dependent, i.e., we do not know aprioriwhat sample size is needed to achieve a desired accuracyon a particular dataset for a particular set of parameters.In this article we propose the use of progressive sampling todetermine the required sample size for association rule mining.We first show that a naive application of progressivesampling is not very efficient for association rule mining.We then present a refinement based on equivalence classes,that seems to work extremely well in practice and is able toconverge to the desired sample size very quickly and veryaccurately. An additional novelty of our approach is thedefinition of a support-sensitive, interactive measure of accuracyacross progressive samples.