Subgroup Discovery for Test Selection: A Novel Approach and Its Application to Breast Cancer Diagnosis

Authors:
Marianne Mueller;Rómer Rosales;Harald Steck;Sriram Krishnan;Bharat Rao;Stefan Kramer
Affiliations:
Institut für Informatik, Technische Universität München, Garching, Germany 85748;IKM CAD and Knowledge Solutions, Siemens Healthcare, Malvern, USA 19335;IKM CAD and Knowledge Solutions, Siemens Healthcare, Malvern, USA 19335;IKM CAD and Knowledge Solutions, Siemens Healthcare, Malvern, USA 19335;IKM CAD and Knowledge Solutions, Siemens Healthcare, Malvern, USA 19335;Institut für Informatik, Technische Universität München, Garching, Germany 85748
Venue:
IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Year:
2009

Citing 7
Cited 0

Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
Exceptional Model Mining

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Data-Efficient Information-Theoretic Test Selection

AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
SD-map: a fast algorithm for exhaustive subgroup discovery

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new approach to test selection based on the discovery of subgroups of patients sharing the same optimal test, and present its application to breast cancer diagnosis. Subgroups are defined in terms of background information about the patient. We automatically determine the best t subgroups a patient belongs to, and decide for the test proposed by their majority. We introduce the concept of prediction quality to measure how accurate the test outcome is regarding the disease status. The quality of a subgroup is then the best mean prediction quality of its members (choosing the same test for all). Incorporating the quality computation in the search heuristic enables a significant reduction of the search space. In experiments on breast cancer diagnosis data we showed that it is faster than the baseline algorithm APRIORI-SD while preserving its accuracy.