The comparisons of prognostic indexes using data mining techniques and Cox regression analysis in the breast cancer data

  • Authors:
  • Mevlut Ture;Fusun Tokatli;Imran Kurt Omurlu

  • Affiliations:
  • Trakya University, Medical Faculty, Department of Biostatistics, Edirne 22030, Turkey;Trakya University, Medical Faculty, Department of Radiation Oncology, Edirne, Turkey;Trakya University, Medical Faculty, Department of Biostatistics, Edirne 22030, Turkey

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.05

Visualization

Abstract

The purpose of this study is to determine new prognostic indexes for the differentiation of subgroups of breast cancer patients with the techniques of decision tree algorithms (C&RT, CHAID, QUEST, ID3, C4.5 and C5.0) and Cox regression analysis for disease-free survival (DFS) in breast cancer patients. A retrospective analysis was performed in 381 breast cancer patients diagnosed. Age, menopausal status, age of menarche, family history of cancer, histologic tumor type, quadrant of tumor, tumor size, estrogen and progesterone receptor status, histologic and nuclear grading, axillary nodal status, pericapsular involvement of lymph nodes, lymphovascular and perineural invasion, adjuvant radiotherapy, chemotherapy and hormonal therapy were assessed. Based on these prognostic factors, new prognostic indexes for C&RT, CHAID, QUEST, ID3, C4.5 and C5.0 and Cox regression were obtained. Prognostic indexes showed a good degree of classification, which demonstrates that an improvement seems possible using standard risk factors. We obtained that C4.5 has a better performance than C&RT, CHAID, QUEST, ID3, C5.0 and Cox regression to determine risk groups using Random Survival Forests (RSF).