The problem of bias in training data in regression problems in medical decision support

Authors:
B. Mac Namee;P. Cunningham;S. Byrne;O. I. Corrigan
Affiliations:
Department of Computer Science, Trinity College, Dublin 2, Ireland;Department of Computer Science, Trinity College, Dublin 2, Ireland;Department of Pharmaceutics and Pharmaceutical Technology, Trinity College, Dublin 2, Ireland;Department of Pharmaceutics and Pharmaceutical Technology, Trinity College, Dublin 2, Ireland
Venue:
Artificial Intelligence in Medicine
Year:
2002

Citing 8
Cited 4

Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Boosted mixture of experts: an ensemble learning scheme

Neural Computation
Robust Classification for Imprecise Environments

Machine Learning
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Improving Minority Class Prediction Using Case-Specific Feature Weights

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A brief introduction to boosting

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Stability problems with artificial neural networks and the ensemble solution

Artificial Intelligence in Medicine

ROC curves and video analysis optimization in intestinal capsule endoscopy

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
An intelligent method for computer-aided trauma decision making system

ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Channel selection and classification of electroencephalogram signals: An artificial neural network and genetic algorithm-based approach

Artificial Intelligence in Medicine
Urinary nucleosides as potential tumor markers evaluated by learning vector quantization

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a bias problem encountered in a machine learning approach to outcome prediction in anticoagulant drug therapy. The outcome to be predicted is a measure of the clotting time for the patient; this measure is continuous and so the prediction task is a regression problem. Artificial neural networks (ANNs) are a powerful mechanism for learning to predict such outcomes from training data. However, experiments have shown that an ANN is biased towards values more commonly occurring in the training data and is thus, less likely to be correct in predicting extreme values. This issue of bias in training data in regression problems is similar to the associated problem with minority classes in classification. However, this bias issue in classification is well documented and is an on-going area of research. In this paper, we consider stratified sampling and boosting as solutions to this bias problem and evaluate them on this outcome prediction problem and on two other datasets. Both approaches produce some improvements with boosting showing the most promise.