Power-law based estimation of set similarity join size
Proceedings of the VLDB Endowment
Output space sampling for graph patterns
Proceedings of the VLDB Endowment
Stratified k-means clustering over a deep web data source
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
We investigate the problem of counting the number of frequent (item)sets---a problem known to be intractable in terms of an exact polynomial time computation. In this paper, we show that it is in general also hard to approximate. Subsequently, a randomized counting algorithm is developed using the Markov chain Monte Carlo method. While for general inputs an exponential running time is needed in order to guarantee a certain approximation bound, we empirically show that the algorithm still has the desired accuracy on real-world datasets when its running time is capped polynomially.