Subgroup Discovery in Data Sets with Multi---dimensional Responses: A Method and a Case Study in Traumatology

Authors:
Lan Umek;Blaž Zupan;Marko Toplak;Annie Morin;Jean-Hugues Chauchat;Gregor Makovec;Dragica Smrke
Affiliations:
Faculty of Computer and Information Sciences, University of Ljubljana, Slovenia;Faculty of Computer and Information Sciences, University of Ljubljana, Slovenia and Dept. of Human and Mol. Genetics, Baylor College of Medicine, Houston, USA;Faculty of Computer and Information Sciences, University of Ljubljana, Slovenia;IRISA, Universite de Rennes 1, Rennes cedex, France 35042;Universite de Lyon, ERIC-Lyon 2, Bron Cedex, France 69676;Dept. of Traumatology, University Clinical Centre, Ljubljana, Slovenia;Dept. of Traumatology, University Clinical Centre, Ljubljana, Slovenia
Venue:
AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
Year:
2009

Citing 7
Cited 2

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
The KDD process for extracting useful knowledge from volumes of data

Communications of the ACM
Top-Down Induction of Clustering Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Adapting classification rule induction to subgroup discovery

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
Learning predictive clustering rules

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases

Maximal exceptions with minimal descriptions

Data Mining and Knowledge Discovery
From black and white to full color: extending redescription mining outside the Boolean world

Statistical Analysis and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biomedical experimental data sets may often include many features both at input (description of cases, treatments, or experimental parameters) and output (outcome description). State-of-the-art data mining techniques can deal with such data, but would consider only one output feature at the time, disregarding any dependencies among them. In the paper, we propose the technique that can treat many output features simultaneously, aiming at finding subgroups of cases that are similar both in input and output space. The method is based on k -medoids clustering and analysis of contingency tables, and reports on case subgroups with significant dependency in input and output space. We have used this technique in explorative analysis of clinical data on femoral neck fractures. The subgroups discovered in our study were considered meaningful by the participating domain expert, and sparked a number of ideas for hypothesis to be further experimentally tested.