Linear space direct pattern sampling using coupling from the past

Authors:
Mario Boley;Sandy Moens;Thomas Gärtner
Affiliations:
Fraunhofer IAIS and University of Bonn, Sankt Augustin, Germany;University of Antwerp, Antwerp, Belgium;Fraunhofer IAIS and University of Bonn, Sankt Augustin, Germany
Venue:
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2012

Citing 10
Cited 3

Exact sampling with coupled Markov chains and applications to statistical mechanics

Proceedings of the seventh international conference on Random structures and algorithms
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Transversing itemset lattices with statistical metric pruning

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Pattern Detection and Discovery

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns

Statistical Analysis and Data Mining
Tight Optimistic Estimates for Fast Subgroup Discovery

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Efficient incremental mining of top-K frequent closed itemsets

DS'07 Proceedings of the 10th international conference on Discovery science
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery
Direct local pattern sampling by efficient two-step random procedures

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Direct out-of-memory distributed parallel frequent pattern mining

Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
One click mining: interactive local pattern discovery through implicit preference and performance learning

Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
Randomly sampling maximal itemsets

Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper shows how coupling from the past (CFTP) can be used to avoid time and memory bottlenecks in direct local pattern sampling procedures. Such procedures draw controlled amounts of suitably biased samples directly from the pattern space of a given dataset in polynomial time. Previous direct pattern sampling methods can produce patterns in rapid succession after some initial preprocessing phase. This preprocessing phase, however, turns out to be prohibitive in terms of time and memory for many datasets. We show how CFTP can be used to avoid any super-linear preprocessing and memory requirements. This allows to simulate more complex distributions, which previously were intractable. We show for a large number of public real-world datasets that these new algorithms are fast to execute and their pattern collections outperform previous approaches both in unsupervised as well as supervised contexts.