Genetic algorithms for simultaneous variable and sample selection in metabonomics

  • Authors:
  • Rachel Cavill;Hector C. Keun;Elaine Holmes;John C. Lindon;Jeremy K. Nicholson;Timothy M. D. Ebbels

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • Bioinformatics
  • Year:
  • 2009

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Metabolic profiles derived from high resolution 1H-NMR data are complex, therefore statistical and machine learning approaches are vital for extracting useful information and biological insights. Focused modelling on targeted subsets of metabolites and samples can improve the predictive ability of models, and techniques such as genetic algorithms (GAs) have a proven utility in feature selection problems. The Consortium for Metabonomic Toxicology (COMET) obtained temporal NMR spectra of urine from rats treated with model toxins and stressors. Here, we develop a GA approach which simultaneously selects sets of samples and spectral regions from the COMET database to build robust, predictive classifiers of liver and kidney toxicity. Results: The results indicate that using simultaneous sample and variable selection improved performance by over 9% compared with either method alone. Simultaneous selection also halved computation time. Successful classifiers repeatedly selected particular variables indicating that this approach can aid defining biomarkers of toxicity. Novel visualizations of the results from multiple computations were developed to aid the interpretability of which samples and variables were frequently selected. This method provides an efficient way to determine the most discriminatory variables and samples for any post-genomic dataset. Availability: GA code available from http://www1.imperial.ac.uk/medicine/people/r.cavill/ Contact:r.cavill@imperial.ac.uk; t.ebbels@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.