Estimating the number of frequent itemsets in a large database

Authors:
Ruoming Jin;Scott McCallen;Yuri Breitbart;Dave Fuhry;Dong Wang
Affiliations:
Kent State University;Kent State University;Kent State University;Kent State University;Kent State University
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 16
Cited 3

Nearly exact tests of conditional independence and marginal homogeneity for sparse contingency tables

Computational Statistics & Data Analysis
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
An overview of query optimization in relational systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Using association rules for product assortment decisions: a case study

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Discovering all most specific sentences

ACM Transactions on Database Systems (TODS)
Fully automatic cross-associations

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
The complexity of mining maximal frequent itemsets and maximal frequent patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Tight upper bounds on the number of candidate patterns

ACM Transactions on Database Systems (TODS)
Average Number of Frequent (Closed) Patterns in Bernouilli and Markovian Databases

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Computing frequent itemsets inside oracle 10G

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Approximating the number of frequent sets in dense data

Knowledge and Information Systems
Power-law based estimation of set similarity join size

Proceedings of the VLDB Endowment
Fastest association rule mining algorithm predictor (FARM-AP)

Proceedings of The Fourth International C* Conference on Computer Science and Software Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

Estimating the number of frequent itemsets for minimal support α in a large dataset is of great interest from both theoretical and practical perspectives. However, finding not only the number of frequent itemsets, but even the number of maximal frequent itemsets, is #P-complete. In this study, we provide a theoretical investigation on the sampling estimator. We discover and prove several fundamental but also rather surprising properties of the sampling estimator. We also propose a novel algorithm to estimate the number of frequent itemsets without using sampling. Our detailed experimental results have shown the accuracy and efficiency of our proposed approach.