Oversampling methods for classification of imbalanced breast cancer malignancy data

  • Authors:
  • Bartosz Krawczyk;Łukasz Jeleń;Adam Krzyżak;Thomas Fevens

  • Affiliations:
  • Department of Systems and Computer Networks, Wrocław University of Technology, Wrocław, Poland;Wrocław School of Applied Informatics, Wrocław, Poland,Institute of Agricultural Engineering, Wrocław University of Environments and Life Science, Wrocław, Poland;Department of Computer Science and Software Engineering, Concordia University, West Montréal, Quebec, Canada;Department of Computer Science and Software Engineering, Concordia University, West Montréal, Quebec, Canada

  • Venue:
  • ICCVG'12 Proceedings of the 2012 international conference on Computer Vision and Graphics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

During breast cancer malignancy grading the main problem that has direct influence on the classification is imbalanced number of cases of the malignancy classes. This poses a challenge for pattern recognition algorithms and leads to a significant decrease of the classification accuracy for the minority class. In this paper we present an approach which ameliorates such a problem. We describe and compare several state of the art methods, that are based on the oversampling approach, i.e. introduction of artificial objects into the dataset to eliminate the disproportion among classes. We also describe the automatic thresholding and fuzzy c-means algorithms used for the nuclei segmentation from fine needle aspirates. Based on the segmented images a set of 15 feattures used for classification process was extracted.