Missing data imputation in breast cancer prognosis

  • Authors:
  • José M. Jerez;Ignacio Molina;José L. Subirats;Leonardo Franco

  • Affiliations:
  • Escuela Técnica Superior de Ingeniería en Informática, Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain;Escuela Técnica Superior de Ingeniería en Telecomunicación, Departamento de Tecnología electrónica., Universidad de Málaga, Málaga, Spain;Escuela Técnica Superior de Ingeniería en Informática, Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain;Escuela Técnica Superior de Ingeniería en Informática, Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain

  • Venue:
  • BioMed'06 Proceedings of the 24th IASTED international conference on Biomedical engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Missing data are often a problem present in real datasets and different imputation techniques are normally used to alleviate this problem. In this paper we analyze the performance of two different data imputation methods in a task where the aim is to predict the probability of breast cancer relapse. Mean imputation and hot-deck methods were used to replace missing values found in a dataset containing 3679 records of patients. Artificial neural network models were trained with the standard dataset (containing no missing data but a restricted number of cases) and also with the data reconstructed by using the two imputation methods mentioned above. The results were analyzed in terms of the predictive accuracy and also in terms of the calibration of the results.