Exact sampling with coupled Markov chains and applications to statistical mechanics
Proceedings of the seventh international conference on Random structures and algorithms
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Transversing itemset lattices with statistical metric pruning
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Pattern Detection and Discovery
Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns
Statistical Analysis and Data Mining
Tight Optimistic Estimates for Fast Subgroup Discovery
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Output space sampling for graph patterns
Proceedings of the VLDB Endowment
Efficient incremental mining of top-K frequent closed itemsets
DS'07 Proceedings of the 10th international conference on Discovery science
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
Direct local pattern sampling by efficient two-step random procedures
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Direct out-of-memory distributed parallel frequent pattern mining
Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
Randomly sampling maximal itemsets
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
Hi-index | 0.00 |
This paper shows how coupling from the past (CFTP) can be used to avoid time and memory bottlenecks in direct local pattern sampling procedures. Such procedures draw controlled amounts of suitably biased samples directly from the pattern space of a given dataset in polynomial time. Previous direct pattern sampling methods can produce patterns in rapid succession after some initial preprocessing phase. This preprocessing phase, however, turns out to be prohibitive in terms of time and memory for many datasets. We show how CFTP can be used to avoid any super-linear preprocessing and memory requirements. This allows to simulate more complex distributions, which previously were intractable. We show for a large number of public real-world datasets that these new algorithms are fast to execute and their pattern collections outperform previous approaches both in unsupervised as well as supervised contexts.