The Effect of Training Sample Size on Performance of Mass Detection

Authors:
Michiel Kallenberg;Nico Karssemeijer
Affiliations:
Department of Radiology, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands 6525 GA;Department of Radiology, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands 6525 GA
Venue:
IWDM '08 Proceedings of the 9th international workshop on Digital Mammography
Year:
2008

Citing 2
Cited 0

Effects of Sample Size in Classifier Design

IEEE Transactions on Pattern Analysis and Machine Intelligence
2008 Special Issue: Classifier performance estimation under the constraint of a finite sample size: Resampling schemes applied to neural network classifiers

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the development of a computer-aided detection (CAD) system a large database containing training samples is of major importance. However, as obtaining training samples may be costly, it is useful to evaluate the effect of training sample size on performance of mass detection and classification. In this paper we investigate the effect of the number of masses as well as the number of normals in the training database. In particular we are interested in the performance of the CAD system operating at high specificity. We use a combination of databases comprising over 5000 cases. Each mammogram is classified multiple times, using neural networks trained with a different number of training samples. To measure performance free-response operator characteristic (FROC)-curves are computed. The mean sensitivity in the interval between 0.05 and 0.5 false positive (FP) marks/image is taken as a performance measure. It was found that performance steadily increases with adding masses to the training database. Even with 555 mass cases a plateau was not yet reached. For normal cases, however, we found that a large number of normals was not needed. The maximal performance was reached with around 700 cases. These results show that optimal training requires a lot of malignant cases, whereas the influence of the number of normal cases is less.