Wrapper- and ensemble-based feature subset selection methods for biomarker discovery in targeted metabolomics

Authors:
Holger Franken;Rainer Lehmann;Hans-Ulrich Häring;Andreas Fritsche;Norbert Stefan;Andreas Zell
Affiliations:
Center for Bioinformatics, University of Tübingen, Tübingen, Germany;Clinical Chemistry and Pathobiochemistry, Central Laboratory, University Hospital Tübingen, Tübingen, Germany and Paul-Langerhans-Institute Tübingen, German Centre for Diabetes Rese ...;Clinical Chemistry and Pathobiochemistry, Central Laboratory, University Hospital Tübingen, Tübingen, Germany and Paul-Langerhans-Institute Tübingen, German Centre for Diabetes Rese ...;Clinical Chemistry and Pathobiochemistry, Central Laboratory, University Hospital Tübingen, Tübingen, Germany and Paul-Langerhans-Institute Tübingen, German Centre for Diabetes Rese ...;Clinical Chemistry and Pathobiochemistry, Central Laboratory, University Hospital Tübingen, Tübingen, Germany and Paul-Langerhans-Institute Tübingen, German Centre for Diabetes Rese ...;Center for Bioinformatics, University of Tübingen, Tübingen, Germany
Venue:
PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
Year:
2011

Citing 10
Cited 1

A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Machine Learning
Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation

Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
Analysis of mass spectral serum profiles for biomarker selection

Bioinformatics
A review of feature selection techniques in bioinformatics

Bioinformatics
Successes and New Directions in Data Mining

Successes and New Directions in Data Mining
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
The EvA2 optimization framework

LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
A fast and elitist multiobjective genetic algorithm: NSGA-II

IEEE Transactions on Evolutionary Computation

Inferring disease-related metabolite dependencies with a bayesian optimization algorithm

EvoBIO'12 Proceedings of the 10th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The discovery of markers allowing for accurate classification of metabolically very similar proband groups constitutes a challenging problem. We apply several search heuristics combined with different classifier types to targeted metabolomics data to identify compound subsets that classify plasma samples of insulin sensitive and -resistant subjects, both suffering from non-alcoholic fatty liver disease. Additionally, we integrate these methods into an ensemble and screen selected subsets for common features. We investigate, which methods appear the most suitable for the task, and test feature subsets for robustness and reproducibility. Furthermore, we consider the predictive potential of different compound classes. We find that classifiers fail in discriminating the non-selected data accurately, but benefit considerably from feature subset selection. Especially, a Pareto-based multi-objective genetic algorithm detects highly discriminative subsets and outperforms widely used heuristics. When transferred to new data, feature sets assembled by the ensemble approach show greater robustness than those selected by single methods.