Empirical metabolite identification via GA feature selection and Bayes classification

Authors:
Paul Anderson;Michael Peterson
Affiliations:
College of Charleston, Charleston, SC;University of Hawaii, Hilo, HI
Venue:
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Year:
2012

Citing 4
Cited 0

Genetic Algorithm Based Methods for Identification of Health Risk Factors Aimed at Preventing Metabolic Syndrome

SEAL '08 Proceedings of the 7th International Conference on Simulated Evolution and Learning
Characterization of 1H NMR spectroscopic data and the generation of synthetic validation sets

Bioinformatics
Identification and quantification of metabolites in 1H NMR spectra by Bayesian model selection

Bioinformatics
Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interpretation of nuclear magnetic resonance (NMR) experimental results for metabolomics studies requires intensive signal processing and multivariate data analysis techniques. A critical process in the typical workflow is the identification of significant metabolites, typically compiled post hoc. Current techniques rely on manual tuning and are built on databases (DBs) of pure compound samples, where the experimental conditions are simulated in the laboratory. Herein, we develop a novel metabolite identification algorithm utilizing a Bayes classifier with genetic algorithm (GA) feature selection built upon empirical spectroscopic data. This captures the inherent variability in experimental data, while greatly reducing the need to build DBs of pure compounds. The ability to annotate spectra by learning patterns within empirical data allows the metabolomics community to utilize existing datasets to improve and extend our method. The feasibility and accuracy of our algorithm is shown by measuring the specificity (0.75) and sensitivity (0.65) on 1H urine derived spectroscopic data. A GA successfully removes more than 60% of the features without sacrificing accuracy, thus reducing redundant and removing irrelevant data in the empirical dataset. This increase in efficiency is critical to extending and improving community annotated identification DBs.