Uncensoring censored data for machine learning: A likelihood-based approach

Authors:
Ivan Štajduhar;Bojana Dalbelo-Bašić
Affiliations:
Department of Computer Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, 51000 Rijeka, Croatia;Department of Electronics, Microelectronics, Computer and Intelligent Systems, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 12
Cited 0

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Combined 5 × 2 cv F test for comparing supervised classification learning algorithms

Neural Computation
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Likelihood based classification in Bayesian networks

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Sparse kernel methods for high-dimensional survival data

Bioinformatics
Logical analysis of survival data

Bioinformatics
Predicting breast cancer survivability: a comparison of three data mining methods

Artificial Intelligence in Medicine
Impact of censoring on learning Bayesian networks in survival modelling

Artificial Intelligence in Medicine
Learning Bayesian networks from survival data using weighting censored instances

Journal of Biomedical Informatics
A Bayesian neural network approach for modelling censored data with an application to prognosis after surgery for breast cancer

Artificial Intelligence in Medicine
A combined neural network and decision trees model for prognosis of breast cancer relapse

Artificial Intelligence in Medicine
Machine learning for survival analysis: a case study on recurrence of prostate cancer

Artificial Intelligence in Medicine

Quantified Score

Hi-index	12.05

Visualization

Abstract

Various machine learning techniques have been applied to different problems in survival analysis in the last decade. They were usually adapted to learning from censored survival data by using the information on observation time. This includes learning from parts of the data or interventions to the learning algorithms. Efficient models were established in various fields of clinical medicine and bioinformatics. In this paper, we propose a pre-processing method for adapting the censored survival data to be used with ordinary machine learning algorithms. This is done by pre-assigning censored instances a positive or negative outcome according to their features and observation time. The proposed procedure calculates the goodness of fit of each censored instance to both the distribution of positives and the spoiled distribution of negatives in the entire dataset and relabels that instance accordingly. We performed a thorough empirical testing of our method in a simulation study and on two real-world medical datasets, using the naive Bayes classifier and decision trees. When compared to one of the popular ML methods dealing with survival, our method provided good results, especially when applied to heavily censored data.