Direct local pattern sampling by efficient two-step random procedures

Authors:
Mario Boley;Claudio Lucchese;Daniel Paurat;Thomas Gärtner
Affiliations:
Fraunhofer IAIS and University of Bonn, Sankt Augustin, Germany;I.S.T.I.-C.N.R. Pisa , Pisa, Italy;Fraunhofer IAIS and University of Bonn, Sankt Augustin, Germany;Fraunhofer IAIS and University of Bonn, Sankt Augustin, Germany
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 21
Cited 3

Monte-Carlo approximation algorithms for enumeration problems

Journal of Algorithms
Efficient algorithms for listing combinatorial structures

Efficient algorithms for listing combinatorial structures
Fast discovery of association rules

Advances in knowledge discovery and data mining
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Transversing itemset lattices with statistical metric pruning

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
Discovering All Most Specific Sentences by Randomized Algorithms

ICDT '97 Proceedings of the 6th International Conference on Database Theory
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Pattern Detection and Discovery

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Finding the most interesting patterns in a database quickly by using sequential sampling

The Journal of Machine Learning Research
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Soft constraint based pattern mining

Data & Knowledge Engineering
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns

Statistical Analysis and Data Mining
Direct mining of discriminative and essential frequent patterns via model-based search tree

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Tight Optimistic Estimates for Fast Subgroup Discovery

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Direct Discriminative Pattern Mining for Effective Classification

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Approximating the number of frequent sets in dense data

Knowledge and Information Systems
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Efficient incremental mining of top-K frequent closed itemsets

DS'07 Proceedings of the 10th international conference on Discovery science

Linear space direct pattern sampling using coupling from the past

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Sampling minimal frequent boolean (DNF) patterns

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Direct out-of-memory distributed parallel frequent pattern mining

Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present several exact and highly scalable local pattern sampling algorithms. They can be used as an alternative to exhaustive local pattern discovery methods (e.g, frequent set mining or optimistic-estimator-based subgroup discovery) and can substantially improve efficiency as well as controllability of pattern discovery processes. While previous sampling approaches mainly rely on the Markov chain Monte Carlo method, our procedures are direct, i.e., non process-simulating, sampling algorithms. The advantages of these direct methods are an almost optimal time complexity per pattern as well as an exactly controlled distribution of the produced patterns. Namely, the proposed algorithms can sample (item-)sets according to frequency, area, squared frequency, and a class discriminativity measure. Experiments demonstrate that these procedures can improve the accuracy of pattern-based models similar to frequent sets and often also lead to substantial gains in terms of scalability.