Tight Optimistic Estimates for Fast Subgroup Discovery

Authors:
Henrik Grosskreutz;Stefan Rüping;Stefan Wrobel
Affiliations:
Fraunhofer IAIS, Schloss Birlinghoven, St. Augustin, Germany 53754;Fraunhofer IAIS, Schloss Birlinghoven, St. Augustin, Germany 53754;Fraunhofer IAIS, Schloss Birlinghoven, St. Augustin, Germany 53754
Venue:
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Year:
2008

Citing 14
Cited 11

Technical note: some properties of splitting criteria

Machine Learning
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A sequential sampling algorithm for a general class of utility criteria

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Spatial Subgroup Mining Integrated in an Object-Relational Spatial Database

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Convex Optimization

Convex Optimization
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned

Machine Learning
The Definitive Guide to db4o

The Definitive Guide to db4o
OPUS: an efficient admissible algorithm for unordered search

Journal of Artificial Intelligence Research
Contrast set mining through subgroup discovery applied to brain ischaemina data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
SD-map: a fast algorithm for exhaustive subgroup discovery

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Fast Subgroup Discovery for Continuous Target Concepts

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Non-redundant Subgroup Discovery Using a Closure System

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Subgroup discovery for election analysis: a case study in descriptive data mining

DS'10 Proceedings of the 13th international conference on Discovery science
Secure top-k subgroup discovery

PSDML'10 Proceedings of the international ECML/PKDD conference on Privacy and security issues in data mining and machine learning
Itemset mining: A constraint programming perspective

Artificial Intelligence
Direct local pattern sampling by efficient two-step random procedures

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast and memory-efficient discovery of the top-k relevant subgroups in a reduced candidate space

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Non-redundant subgroup discovery in large and complex data

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Secure Distributed Subgroup Discovery in Horizontally Partitioned Data

Transactions on Data Privacy
Linear space direct pattern sampling using coupling from the past

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
An enhanced relevance criterion for more concise supervised pattern discovery

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Subgroup discovery is the task of finding subgroups of a population which exhibit both distributional unusualness and high generality. Due to the non monotonicity of the corresponding evaluation functions, standard pruning techniques cannot be used for subgroup discovery, requiring the use of optimistic estimate techniques instead. So far, however, optimistic estimate pruning has only been considered for the extremely simple case of a binary target attribute and up to now no attempt was made to move beyond suboptimal heuristic optimistic estimates. In this paper, we show that optimistic estimate pruning can be developed into a sound and highly effective pruning approach for subgroup discovery. Based on a precise definition of optimality we show that previous estimates have been tight only in special cases. Thereafter, we present tight optimistic estimates for the most popular binary and multi-class quality functions, and present a family of increasingly efficient approximations to these optimal functions. As we show in empirical experiments, the use of our newly proposed optimistic estimates can lead to a speed up of an order of magnitude compared to previous approaches.