Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients

  • Authors:
  • Mevlut Ture;Fusun Tokatli;Imran Kurt

  • Affiliations:
  • Trakya University, Medical Faculty, Department of Biostatistics, 22030 Edirne, Turkey;Trakya University, Medical Faculty, Department of Radiation Oncology, Edirne, Turkey;Eskisehir Osmangazi University, Medical Faculty, Department of Biostatistics, Eskisehir, Turkey

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.06

Visualization

Abstract

Current evidence supports a clear association between clinical and pathologic factors and recurrence-free survival (RFS) in breast cancer patients. The Cox regression model is the most common tool for investigating simultaneously the influence of several factors on the survival time of patients. But it gives no estimate of the degree of separation of the different subgroups. We propose to analyze different decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) and use them additionally to the well-known Kaplan-Meier estimates to investigate the predictive power of these methods. Five hundred patients were included to the study. Two hundred and seventy-nine of them had complete data for prognostic factors and median follow-up is about 40.5 months. First, decision tree methods were analyzed for prognostic factors. Then, according to multidimensional scaling method C4.5 (error rate 0.2258 for training set and 0.3259 for cross-validation) performed slightly better than other methods in predicting risk factors for recurrence. Tumor size, age of menarche, hormonal therapy, histological grade and axillary nodal status are found that an important risk factors for the recurrence. Eight terminal nodes were found and stratified by Kaplan-Meier survival curves. Larger tumor size (=4.4cm) and receiving no hormonal therapy in a small subgroup of patients were associated with worse prognosis. The five-year RFS is 71.3% in the whole patient population. The sensitivity, specificity and predictive rates calculated by C4.5 method were found 43.8%, 91% and 77.4%, respectively. In this study, C4.5 showed a better degree of separation. As a result, we recommend to use decision tree methods together with Kaplan-Meier analysis to determine risk factors and effect of this factors on survival.