Monte-Carlo approximation algorithms for enumeration problems
Journal of Algorithms
Efficient algorithms for listing combinatorial structures
Efficient algorithms for listing combinatorial structures
Fast discovery of association rules
Advances in knowledge discovery and data mining
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Transversing itemset lattices with statistical metric pruning
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Detecting Group Differences: Mining Contrast Sets
Data Mining and Knowledge Discovery
Discovering All Most Specific Sentences by Randomized Algorithms
ICDT '97 Proceedings of the 6th International Conference on Database Theory
An Algorithm for Multi-relational Discovery of Subgroups
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Pattern Detection and Discovery
Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Evaluation of sampling for data mining of association rules
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Finding the most interesting patterns in a database quickly by using sequential sampling
The Journal of Machine Learning Research
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery
Soft constraint based pattern mining
Data & Knowledge Engineering
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns
Statistical Analysis and Data Mining
Direct mining of discriminative and essential frequent patterns via model-based search tree
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Tight Optimistic Estimates for Fast Subgroup Discovery
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Direct Discriminative Pattern Mining for Effective Classification
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Approximating the number of frequent sets in dense data
Knowledge and Information Systems
Output space sampling for graph patterns
Proceedings of the VLDB Endowment
Efficient incremental mining of top-K frequent closed itemsets
DS'07 Proceedings of the 10th international conference on Discovery science
Linear space direct pattern sampling using coupling from the past
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Sampling minimal frequent boolean (DNF) patterns
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Direct out-of-memory distributed parallel frequent pattern mining
Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Hi-index | 0.00 |
We present several exact and highly scalable local pattern sampling algorithms. They can be used as an alternative to exhaustive local pattern discovery methods (e.g, frequent set mining or optimistic-estimator-based subgroup discovery) and can substantially improve efficiency as well as controllability of pattern discovery processes. While previous sampling approaches mainly rely on the Markov chain Monte Carlo method, our procedures are direct, i.e., non process-simulating, sampling algorithms. The advantages of these direct methods are an almost optimal time complexity per pattern as well as an exactly controlled distribution of the produced patterns. Namely, the proposed algorithms can sample (item-)sets according to frequency, area, squared frequency, and a class discriminativity measure. Experiments demonstrate that these procedures can improve the accuracy of pattern-based models similar to frequent sets and often also lead to substantial gains in terms of scalability.