Evolutionary identification of cancer predictors using clustered data: a case study for breast cancer, melanoma, and cancer in the respiratory system

Authors:
Stephan M. Winkler;Michael Affenzeller;Herbert Stekel
Affiliations:
Upper Austria University of Applied Sciences, Hagenberg, Austria;Upper Austria University of Applied Sciences, Hagenberg, Austria;General Hospital Linz, Linz, Austria
Venue:
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Year:
2013

Citing 11
Cited 0

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
System identification (2nd ed.): theory for the user

System identification (2nd ed.): theory for the user
Foundations of genetic programming

Foundations of genetic programming
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Introduction to Evolutionary Computing

Introduction to Evolutionary Computing
SASEGASA: A New Generic Parallel Evolutionary Algorithm for Achieving Highest Quality Results

Journal of Heuristics
Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications

Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Classification of tumor marker values using heuristic data mining methods

Proceedings of the 12th annual conference companion on Genetic and evolutionary computation
Identification of cancer diagnosis estimation models using evolutionary algorithms: a case study for breast cancer, melanoma, and cancer in the respiratory system

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we discuss the effects of using pre-clustered data on the identification of estimation models for cancer diagnoses. Based on patients' data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors, the goal is to identify mathematical models for estimating cancer diagnoses. We have applied a hybrid clustering and classification approach that first identifies data clusters (using standard patient data and tumor markers) and then learns prediction models on the basis of these data clusters. In the empirical section we analyze the clusters of patient data samples formed using k-means clustering: The optimal number of clusters is identified, and we investigate the homogeneity of these clusters. Several evolutionary modeling approaches implemented in HeuristicLab have been applied for subsequently identifying estimators for selected cancer diagnoses: Linear regression, k-nearest neighbor learning, artificial neural networks, and support vector machines (all optimized using evolutionary algorithms) as well as genetic programming. As we show in the results section, the investigated diagnoses of breast cancer, melanoma, and respiratory system cancer can be estimated correctly in up to 84.2%, 80.3%, and 94.1% of the analyzed test cases, respectively; without tumor markers up to 78.2%, 78%, and 93.3% of the test samples are correctly estimated, respectively.