On subgroup discovery in numerical domains

Authors:
Henrik Grosskreutz;Stefan Rüping
Affiliations:
Fraunhofer IAIS, Schloss Birlinghoven, Sankt Augustin, Germany;Fraunhofer IAIS, Schloss Birlinghoven, Sankt Augustin, Germany
Venue:
Data Mining and Knowledge Discovery
Year:
2009

Citing 10
Cited 6

Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Discovering associations with numeric variables

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Spatial Subgroup Mining Integrated in an Object-Relational Spatial Database

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Adapting classification rule induction to subgroup discovery

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned

Machine Learning
Interpreting PET Scans by Structured Patient Data: A Data Mining Case Study in Dementia Research

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
SD-map: a fast algorithm for exhaustive subgroup discovery

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Relevancy in constraint-based subgroup discovery

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases

Guest editors' introduction: special issue of selected papers from ECML PKDD 2009

Data Mining and Knowledge Discovery
Guest editors' introduction: Special Issue from ECML PKDD 2009

Machine Learning
On Subgroup Discovery in Numerical Domains

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Different slopes for different folks: mining for exceptional regression models with cook's distance

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
From black and white to full color: extending redescription mining outside the Boolean world

Statistical Analysis and Data Mining
Subgroup discovery using bump hunting on multi-relational histograms

ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Subgroup discovery is a Knowledge Discovery task that aims at finding subgroups of a population with high generality and distributional unusualness. While several subgroup discovery algorithms have been presented in the past, they focus on databases with nominal attributes or make use of discretization to get rid of the numerical attributes. In this paper, we illustrate why the replacement of numerical attributes by nominal attributes can result in suboptimal results. Thereafter, we present a new subgroup discovery algorithm that prunes large parts of the search space by exploiting bounds between related numerical subgroup descriptions. The same algorithm can also be applied to ordinal attributes. In an experimental section, we show that the use of our new pruning scheme results in a huge performance gain when more that just a few split-points are considered for the numerical attributes.