Survival Prediction in Lung Cancer Treated with Radiotherapy: Bayesian Networks vs. Support Vector Machines in Handling Missing Data

  • Authors:
  • Andre Dekker;Cary Dehing-Oberije;Dirk De Ruysscher;Philippe Lambin;Kartik Komati;Glenn Fung;Shipeng Yu;Andrew Hope;Wilfried De Neve;Yolande Lievens

  • Affiliations:
  • -;-;-;-;-;-;-;-;-;-

  • Venue:
  • ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Missing data is a given in the medical domain, so machine learning models should have satisfactory performance even when missing data occurs. Our previous work has focused on support vector machines (SVM), but we hypothesize that Bayesian networks (BN) can handle missing data better. To test the hypothesis, we trained a BN and SVM model for 2 year survival on 322 lung cancer patients and compared their performance in three separate external datasets (35, 47, 33 patients), each with their own characteristics in terms of missing data. The models used tumor size, clinical T and N stage, involved lymph nodes and WHO performance as prognostic features. We found that the BN model performed better than SVM (AUC 0.77, 0.72. 0.70 vs. 0.71, 0.68, 0.69), especially if tumor size was missing. We conclude that BN models are better suited for the medical domain, as they can handle missing data better.