Empirical metabolite identification via GA feature selection and Bayes classification

  • Authors:
  • Paul Anderson;Michael Peterson

  • Affiliations:
  • College of Charleston, Charleston, SC;University of Hawaii, Hilo, HI

  • Venue:
  • Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Interpretation of nuclear magnetic resonance (NMR) experimental results for metabolomics studies requires intensive signal processing and multivariate data analysis techniques. A critical process in the typical workflow is the identification of significant metabolites, typically compiled post hoc. Current techniques rely on manual tuning and are built on databases (DBs) of pure compound samples, where the experimental conditions are simulated in the laboratory. Herein, we develop a novel metabolite identification algorithm utilizing a Bayes classifier with genetic algorithm (GA) feature selection built upon empirical spectroscopic data. This captures the inherent variability in experimental data, while greatly reducing the need to build DBs of pure compounds. The ability to annotate spectra by learning patterns within empirical data allows the metabolomics community to utilize existing datasets to improve and extend our method. The feasibility and accuracy of our algorithm is shown by measuring the specificity (0.75) and sensitivity (0.65) on 1H urine derived spectroscopic data. A GA successfully removes more than 60% of the features without sacrificing accuracy, thus reducing redundant and removing irrelevant data in the empirical dataset. This increase in efficiency is critical to extending and improving community annotated identification DBs.