Explora: a multipattern and multistrategy discovery assistant
Advances in knowledge discovery and data mining
Efficient mining of association rules using closed itemset lattices
Information Systems
Transversing itemset lattices with statistical metric pruning
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An Algorithm for Multi-relational Discovery of Subgroups
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Subgroup Discovery with CN2-SD
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Tight Optimistic Estimates for Fast Subgroup Discovery
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
The Chosen Few: On Identifying Valuable Patterns
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Correlated itemset mining in ROC space: a constraint programming approach
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Subgroup Discovery for Continuous Target Concepts
ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Non-redundant Subgroup Discovery Using a Closure System
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Relevancy in constraint-based subgroup discovery
Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
An enhanced relevance criterion for more concise supervised pattern discovery
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
We consider a modified version of the top-k subgroup discovery task, where subgroups dominated by other subgroups are discarded. The advantage of this modified task, known as relevant subgroup discovery, is that it avoids redundancy in the outcome. Although it has been applied in many applications, so far no efficient exact algorithm for this task has been proposed. Most existing solutions do not guarantee the exact solution (as a result of the use of non-admissible heuristics), while the only exact solution relies on the explicit storage of the whole search space, which results in prohibitively large memory requirements. In this paper, we present a new top-k relevant subgroup discovery algorithm which overcomes these shortcomings. Our solution is based on the fact that if an iterative deepening approach is applied, the relevance check - which is the root of the problems of all other approaches - can be realized based solely on the best k subgroups visited so far. The approach also allows for the integration of admissible pruning techniques like optimistic estimate pruning. The result is a fast, memory-efficient algorithm which clearly outperforms existing top-k relevant subgroup discovery approaches. Moreover, we analytically and empirically show that it is competitive with simpler approaches which do not consider the relevance criterion.