A Randomized Approach for Approximating the Number of Frequent Sets

Authors:
Mario Boley;Henrik Grosskreutz
Affiliations:
-;-
Venue:
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Year:
2008

Citing 0
Cited 3

Power-law based estimation of set similarity join size

Proceedings of the VLDB Endowment
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Stratified k-means clustering over a deep web data source

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the problem of counting the number of frequent (item)sets---a problem known to be intractable in terms of an exact polynomial time computation. In this paper, we show that it is in general also hard to approximate. Subsequently, a randomized counting algorithm is developed using the Markov chain Monte Carlo method. While for general inputs an exponential running time is needed in order to guarantee a certain approximation bound, we empirically show that the algorithm still has the desired accuracy on real-world datasets when its running time is capped polynomially.