A machine learning and chemometrics assisted interpretation of spectroscopic data --- a NMR-Based metabolomics platform for the assessment of brazilian propolis

  • Authors:
  • Marcelo Maraschin;Amélia Somensi-Zeggio;Simone K. Oliveira;Shirley Kuhnen;Maíra M. Tomazzoli;Ana C. M. Zeri;Rafael Carreira;Miguel Rocha

  • Affiliations:
  • Plant Morphogenesis and Biochemistry Laboratory, Federal University of Santa Catarina, Florianópolis, SC, Brazil,CCTC, School of Engineering, University of Minho, Braga, Portugal;Plant Morphogenesis and Biochemistry Laboratory, Federal University of Santa Catarina, Florianópolis, SC, Brazil;Plant Morphogenesis and Biochemistry Laboratory, Federal University of Santa Catarina, Florianópolis, SC, Brazil;Plant Morphogenesis and Biochemistry Laboratory, Federal University of Santa Catarina, Florianópolis, SC, Brazil;Plant Morphogenesis and Biochemistry Laboratory, Federal University of Santa Catarina, Florianópolis, SC, Brazil;National Laboratory of Bioscience, Campinas, SP, Brazil;CCTC, School of Engineering, University of Minho, Braga, Portugal;CCTC, School of Engineering, University of Minho, Braga, Portugal

  • Venue:
  • PRIB'12 Proceedings of the 7th IAPR international conference on Pattern Recognition in Bioinformatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work, a metabolomics dataset from 1H nuclear magnetic resonance spectroscopy of Brazilian propolis was analyzed using machine learning algorithms, including feature selection and classification methods. Partial least square-discriminant analysis (PLS-DA), random forest (RF), and wrapper methods combining decision trees and rules with evolutionary algorithms (EA) showed to be complementary approaches, allowing to obtain relevant information as to the importance of a given set of features, mostly related to the structural fingerprint of aliphatic and aromatic compounds typically found in propolis, e.g., fatty acids and phenolic compounds. The feature selection and decision tree-based algorithms used appear to be suitable tools for building classification models for the Brazilian propolis metabolomics regarding its geographic origin, with consistency, high accuracy, and avoiding redundant information as to the metabolic signature of relevant compounds.