Risk group detection and survival function estimation for interval coded survival methods

  • Authors:
  • Vanya Van Belle;Patrick Neven;Vernon Harvey;Sabine Van Huffel;Johan A. K. Suykens;Stephen Boyd

  • Affiliations:
  • Department of Electrical Engineering (ESAT-SCD), KU Leuven/iMinds Future Health Department, Leuven, Belgium and Department of Mathematics and Statistics, Liverpool John Moores University, Liverpoo ...;Department of Gynaecological Oncology, University Hospitals Leuven, Leuven, Belgium and Multidisciplinary Breast Centre (MBC), University Hospitals Leuven, Leuven, Belgium;Regional Cancer Centre, Auckland City Hospital, Auckland, New Zealand;Department of Electrical Engineering (ESAT-SCD), KU Leuven/iMinds Future Health Department, Leuven, Belgium;Department of Electrical Engineering (ESAT-SCD), KU Leuven/iMinds Future Health Department, Leuven, Belgium;Department of Electrical Engineering, Stanford University, Stanford, CA, United States

  • Venue:
  • Neurocomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

The highly flexible model structure of methods in data mining and machine learning results in models that are often difficult to interpret. Their use in domains where interpretability is an issue is therefore hampered. In order to bridge the gap between advanced modeling techniques and their use in domains that demand interpretable results, the interpretability aspect should be included in the design of the technique. The Interval Coded Score index (ICS) is a recently proposed model that satisfies this condition and automatically detects thresholds on variables to generate score systems. The method was extended for censored data (ICSc) but two problems remain: (i) given a prognostic index, how can observations be grouped in different risk groups; (ii) given the risk groups, how can survival curves be estimated for survival models based on support vector machines or ICS models. This work offers solutions to both these problems. The ICSc model is used on the prognostic index to detect thresholds on this index. A grouped index, that can be interpreted as a risk group indicator, is the result. The method is then modified to ensure that observations with a lower prognostic index are allocated to higher risk groups. The second problem is tackled by simultaneously estimating multiple Kaplan-Meier curves, taking into account that the estimated survival curve for higher risk groups should always be lower than the curve for lower risk groups. The proposed approach is illustrated on the prognosis of breast cancer patients and compared with the proportional hazard model. Both models are comparable w.r.t. discrimination, but calibration is better for the ICSc risk groups.