Progressive sampling for association rules based on sampling error estimation

  • Authors:
  • Kun-Ta Chuang;Ming-Syan Chen;Wen-Chieh Yang

  • Affiliations:
  • Graduate Institute of Communication Engineering, National Taiwan University, Intelligent Business Technology Inc., Taipei, Taiwan, ROC;Graduate Institute of Communication Engineering, National Taiwan University, Intelligent Business Technology Inc., Taipei, Taiwan, ROC;Graduate Institute of Communication Engineering, National Taiwan University, Intelligent Business Technology Inc., Taipei, Taiwan, ROC

  • Venue:
  • PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We explore in this paper a progressive sampling algorithm, called Sampling Error Estimation (SEE), which aims to identify an appropriate sample size for mining association rules. SEE has two advantages over previous works in the literature. First, SEE is highly efficient because an appropriate sample size can be determined without the need of executing association rules. Second, the identified sample size of SEE is very accurate, meaning that association rules can be highly efficiently executed on a sample of this size to obtain a sufficiently accurate result. This is attributed to the merit of SEE for being able to significantly reduce the influence of randomness by examining several samples with the same size in one database scan. As validated by experiments on various real data and synthetic data, SEE can achieve very prominent improvement in efficiency and also the resulting accuracy over previous works.