A rule-based scheme for filtering examples from majority class in an imbalanced training set

  • Authors:
  • Jamshid Dehmeshki;Mustafa Karaköy;Manlio Valdivieso Casique

  • Affiliations:
  • Medicsight Plc., London, England;Medicsight Plc., London, England;Medicsight Plc., London, England

  • Venue:
  • MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Developing a Computer-Assisted Detection (CAD) system for automatic diagnosis of pulmonary nodules in thoracic CT is a highly challenging research area in the medical domain. It requires a successful application of quite sophisticated, state-of-the-art image processing and pattern recognition technologies. The object recognition and feature extraction phase of such a system generates a huge imbalanced training set, as is the case in many learning problems in medical domain. The performance of concept learning systems is traditionally assessed with the percentage of testing examples classified correctly, termed as accuracy. This accuracy measurement becomes inappropriate for imbalanced training sets like in this case, where the nonnodules (negative) examples outnumber nodule (positive) examples. This paper introduces the mechanism developed for filtering negative examples in the training so as to remove 'obvious' ones, and discusses alternative evaluation criteria.