A comparison of machine learning methods for the prediction of breast cancer

Authors:
Sara Silva;Orlando Anunciação;Marco Lotz
Affiliations:
INESC-ID Lisboa, IST/UTL, Portugal and CISUC, University of Coimbra, Portugal;INESC-ID Lisboa, IST/UTL, Portugal;Tropical Research Institute, Lisbon, Portugal
Venue:
EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Year:
2011

Citing 12
Cited 0

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Machine Learning

Machine Learning
Random Forests

Machine Learning
SNPHarvester

Bioinformatics
Genome-wide association analysis by lasso penalized logistic regression

Bioinformatics
Bioinformatics challenges for genome-wide association studies

Bioinformatics
A Field Guide to Genetic Programming

A Field Guide to Genetic Programming
On the use of genetic programming for the prediction of survival in cancer

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Identification of individualized feature combinations for survival prediction in breast cancer: a comparison of machine learning techniques

EvoBIO'10 Proceedings of the 8th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Application of genetic programming classification in an industrial process resulting in greenhouse gas emission reductions

EvoCOMNET'10 Proceedings of the 2010 international conference on Applications of Evolutionary Computation - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we perform a comparison of machine learning methods in an association study with the goal of finding reliable classifiers that predict the presence or absence of breast cancer based on single nucleotide polymorphisms from the BRCA1, BRCA2 and TP53 genes. We emphasize how misleading some common statistical measures can be when evaluating classifiers whose learning was biased by an unbalanced dataset, as in our case. Then we compare and discuss the format of different solutions from the interpretability point of view, revealing a correlation between size and performance of the solutions, and also identify a small set of preferred features that agree with previously published work. We designate CART regression trees as the best classifiers, both in terms of performance and interpretability, and discuss how to improve the results reported here.