Estimating the number of frequent itemsets in a large database

  • Authors:
  • Ruoming Jin;Scott McCallen;Yuri Breitbart;Dave Fuhry;Dong Wang

  • Affiliations:
  • Kent State University;Kent State University;Kent State University;Kent State University;Kent State University

  • Venue:
  • Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Estimating the number of frequent itemsets for minimal support α in a large dataset is of great interest from both theoretical and practical perspectives. However, finding not only the number of frequent itemsets, but even the number of maximal frequent itemsets, is #P-complete. In this study, we provide a theoretical investigation on the sampling estimator. We discover and prove several fundamental but also rather surprising properties of the sampling estimator. We also propose a novel algorithm to estimate the number of frequent itemsets without using sampling. Our detailed experimental results have shown the accuracy and efficiency of our proposed approach.