Technical note: some properties of splitting criteria
Machine Learning
Explora: a multipattern and multistrategy discovery assistant
Advances in knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A sequential sampling algorithm for a general class of utility criteria
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
An Algorithm for Multi-relational Discovery of Subgroups
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Spatial Subgroup Mining Integrated in an Object-Relational Spatial Database
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery
Convex Optimization
Subgroup Discovery with CN2-SD
The Journal of Machine Learning Research
The Definitive Guide to db4o
OPUS: an efficient admissible algorithm for unordered search
Journal of Artificial Intelligence Research
Contrast set mining through subgroup discovery applied to brain ischaemina data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
SD-map: a fast algorithm for exhaustive subgroup discovery
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Fast Subgroup Discovery for Continuous Target Concepts
ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Non-redundant Subgroup Discovery Using a Closure System
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Subgroup discovery for election analysis: a case study in descriptive data mining
DS'10 Proceedings of the 13th international conference on Discovery science
Secure top-k subgroup discovery
PSDML'10 Proceedings of the international ECML/PKDD conference on Privacy and security issues in data mining and machine learning
Itemset mining: A constraint programming perspective
Artificial Intelligence
Direct local pattern sampling by efficient two-step random procedures
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast and memory-efficient discovery of the top-k relevant subgroups in a reduced candidate space
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Non-redundant subgroup discovery in large and complex data
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Secure Distributed Subgroup Discovery in Horizontally Partitioned Data
Transactions on Data Privacy
Linear space direct pattern sampling using coupling from the past
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
An enhanced relevance criterion for more concise supervised pattern discovery
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Subgroup discovery is the task of finding subgroups of a population which exhibit both distributional unusualness and high generality. Due to the non monotonicity of the corresponding evaluation functions, standard pruning techniques cannot be used for subgroup discovery, requiring the use of optimistic estimate techniques instead. So far, however, optimistic estimate pruning has only been considered for the extremely simple case of a binary target attribute and up to now no attempt was made to move beyond suboptimal heuristic optimistic estimates. In this paper, we show that optimistic estimate pruning can be developed into a sound and highly effective pruning approach for subgroup discovery. Based on a precise definition of optimality we show that previous estimates have been tight only in special cases. Thereafter, we present tight optimistic estimates for the most popular binary and multi-class quality functions, and present a family of increasingly efficient approximations to these optimal functions. As we show in empirical experiments, the use of our newly proposed optimistic estimates can lead to a speed up of an order of magnitude compared to previous approaches.