Missing data imputation in breast cancer prognosis

Authors:
José M. Jerez;Ignacio Molina;José L. Subirats;Leonardo Franco
Affiliations:
Escuela Técnica Superior de Ingeniería en Informática, Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain;Escuela Técnica Superior de Ingeniería en Telecomunicación, Departamento de Tecnología electrónica., Universidad de Málaga, Málaga, Spain;Escuela Técnica Superior de Ingeniería en Informática, Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain;Escuela Técnica Superior de Ingeniería en Informática, Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain
Venue:
BioMed'06 Proceedings of the 24th IASTED international conference on Biomedical engineering
Year:
2006

Citing 6
Cited 2

Statistical analysis with missing data

Statistical analysis with missing data
Mathematical Methods for Neural Network Analysis and Design

Mathematical Methods for Neural Network Analysis and Design
Logistic regression and artificial neural network classification models: a methodology review

Journal of Biomedical Informatics
Partial identification with missing data: concepts and findings

International Journal of Approximate Reasoning
A combined neural network and decision trees model for prognosis of breast cancer relapse

Artificial Intelligence in Medicine
On the use of artificial neural networks for the analysis of survival data

IEEE Transactions on Neural Networks

K nearest neighbours with mutual information for simultaneous classification and missing data imputation

Neurocomputing
Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Missing data are often a problem present in real datasets and different imputation techniques are normally used to alleviate this problem. In this paper we analyze the performance of two different data imputation methods in a task where the aim is to predict the probability of breast cancer relapse. Mean imputation and hot-deck methods were used to replace missing values found in a dataset containing 3679 records of patients. Artificial neural network models were trained with the standard dataset (containing no missing data but a restricted number of cases) and also with the data reconstructed by using the two imputation methods mentioned above. The results were analyzed in terms of the predictive accuracy and also in terms of the calibration of the results.