The Effect of Training Sample Size on Performance of Mass Detection

  • Authors:
  • Michiel Kallenberg;Nico Karssemeijer

  • Affiliations:
  • Department of Radiology, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands 6525 GA;Department of Radiology, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands 6525 GA

  • Venue:
  • IWDM '08 Proceedings of the 9th international workshop on Digital Mammography
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the development of a computer-aided detection (CAD) system a large database containing training samples is of major importance. However, as obtaining training samples may be costly, it is useful to evaluate the effect of training sample size on performance of mass detection and classification. In this paper we investigate the effect of the number of masses as well as the number of normals in the training database. In particular we are interested in the performance of the CAD system operating at high specificity. We use a combination of databases comprising over 5000 cases. Each mammogram is classified multiple times, using neural networks trained with a different number of training samples. To measure performance free-response operator characteristic (FROC)-curves are computed. The mean sensitivity in the interval between 0.05 and 0.5 false positive (FP) marks/image is taken as a performance measure. It was found that performance steadily increases with adding masses to the training database. Even with 555 mass cases a plateau was not yet reached. For normal cases, however, we found that a large number of normals was not needed. The maximal performance was reached with around 700 cases. These results show that optimal training requires a lot of malignant cases, whereas the influence of the number of normal cases is less.