PRIM versus CART in subgroup discovery: When patience is harmful

Authors:
Ameen Abu-Hanna;Barry Nannings;Dave Dongelmans;Arie Hasman
Affiliations:
Department of Medical Informatics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands;Department of Medical Informatics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands;Department of Intensive Care, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands;Department of Medical Informatics, Academic Medical Center, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands
Venue:
Journal of Biomedical Informatics
Year:
2010

Citing 4
Cited 1

Bump hunting in high-dimensional data

Statistics and Computing
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Flexible patient rule induction method for optimizing process variables in discrete type

Expert Systems with Applications: An International Journal

Contrasting temporal trend discovery for large healthcare databases

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

We systematically compare the established algorithms CART (Classification and Regression Trees) and PRIM (Patient Rule Induction Method) in a subgroup discovery task on a large real-world high-dimensional clinical database. Contrary to current conjectures, PRIM's performance was generally inferior to CART's. PRIM often considered ''peeling of'' a large chunk of data at a value of a relevant discrete ordinal variable unattractive, ultimately missing an important subgroup. This finding has considerable significance in clinical medicine where ordinal scores are ubiquitous. PRIM's utility in clinical databases would increase when global information about (ordinal) variables is better put to use and when the search algorithm keeps track of alternative solutions.