A comparison of machine learning methods for the prediction of breast cancer

  • Authors:
  • Sara Silva;Orlando Anunciação;Marco Lotz

  • Affiliations:
  • INESC-ID Lisboa, IST/UTL, Portugal and CISUC, University of Coimbra, Portugal;INESC-ID Lisboa, IST/UTL, Portugal;Tropical Research Institute, Lisbon, Portugal

  • Venue:
  • EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work we perform a comparison of machine learning methods in an association study with the goal of finding reliable classifiers that predict the presence or absence of breast cancer based on single nucleotide polymorphisms from the BRCA1, BRCA2 and TP53 genes. We emphasize how misleading some common statistical measures can be when evaluating classifiers whose learning was biased by an unbalanced dataset, as in our case. Then we compare and discuss the format of different solutions from the interpretability point of view, revealing a correlation between size and performance of the solutions, and also identify a small set of preferred features that agree with previously published work. We designate CART regression trees as the best classifiers, both in terms of performance and interpretability, and discuss how to improve the results reported here.