Approximating the number of frequent sets in dense data

Authors:
Mario Boley;Henrik Grosskreutz
Affiliations:
Fraunhofer IAIS, Schloss Birlinghoven, 53754, Sankt Augustin, Germany;Fraunhofer IAIS, Schloss Birlinghoven, 53754, Sankt Augustin, Germany
Venue:
Knowledge and Information Systems
Year:
2009

Citing 22
Cited 4

Random generation of combinatorial structures from a uniform

Theoretical Computer Science
Monte-Carlo approximation algorithms for enumeration problems

Journal of Algorithms
The Markov chain Monte Carlo method: an approach to approximate counting and integration

Approximation algorithms for NP-hard problems
On Unapproximable Versions of NP-Complete Problems

SIAM Journal on Computing
Can we push more constraints into frequent pattern mining?

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
On frequent sets of Boolean matrices

Annals of Mathematics and Artificial Intelligence
The MiningMart Approach

Informatik bewegt: Informatik 2002 - 32. Jahrestagung der Gesellschaft für Informatik e.v. (GI)
Discovering all most specific sentences

ACM Transactions on Database Systems (TODS)
Finding the most interesting patterns in a database quickly by using sequential sampling

The Journal of Machine Learning Research
Ruling Out PTAS for Graph Min-Bisection, Densest Subgraph and Bipartite Clique

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Toward Intelligent Assistance for a Data Mining Process: An Ontology-Based Approach for Cost-Sensitive Classification

IEEE Transactions on Knowledge and Data Engineering
TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets

IEEE Transactions on Knowledge and Data Engineering
Tight upper bounds on the number of candidate patterns

ACM Transactions on Database Systems (TODS)
Rapidly Mixing Markov Chains with Applications in Computer Science and Physics

Computing in Science and Engineering
Interactive visual exploration of association rules with rule-focusing methodology

Knowledge and Information Systems
Computing frequent itemsets inside oracle 10G

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Top 10 algorithms in data mining

Knowledge and Information Systems
Computing the minimum-support for mining frequent patterns

Knowledge and Information Systems
Maximum entropy based significance of itemsets

Knowledge and Information Systems
Estimating the number of frequent itemsets in a large database

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
On approximating minimum infrequent and maximum frequent sets

DS'07 Proceedings of the 10th international conference on Discovery science

Direct local pattern sampling by efficient two-step random procedures

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining bridging rules between conceptual clusters

Applied Intelligence
Sampling minimal frequent boolean (DNF) patterns

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive pattern mining on hidden data: a sampling-based solution

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the problem of counting the number of frequent (item)sets—a problem known to be intractable in terms of an exact polynomial time computation. In this paper, we show that it is in general also hard to approximate. Subsequently, a randomized counting algorithm is developed using the Markov chain Monte Carlo method. While for general inputs an exponential running time is needed in order to guarantee a certain approximation bound, we show that the algorithm still has the desired accuracy on several real-world datasets when its running time is capped polynomially.