Modeling and prediction of peptide drift times in ion mobility spectrometry using sequence-based and structure-based approaches

  • Authors:
  • Yiming Zhang;Quan Jin;Shuting Wang;Ren Ren

  • Affiliations:
  • Hangzhou Center for Disease Control and Prevention (HZCDC), Hangzhou 310021, China;Hangzhou Center for Disease Control and Prevention (HZCDC), Hangzhou 310021, China;Hangzhou Center for Disease Control and Prevention (HZCDC), Hangzhou 310021, China;Hangzhou Center for Disease Control and Prevention (HZCDC), Hangzhou 310021, China

  • Venue:
  • Computers in Biology and Medicine
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The mobile behavior of 1481 peptides in ion mobility spectrometry (IMS), which are generated by protease digestion of the Drosophila melanogaster proteome, is modeled and predicted based on two different types of characterization methods, i.e. sequence-based approach and structure-based approach. In this procedure, the sequence-based approach considers both the amino acid composition of a peptide and the local environment profile of each amino acid in the peptide; the structure-based approach is performed with the CODESSA protocol, which regards a peptide as a common organic compound and generates more than 200 statistically significant variables to characterize the whole structure profile of a peptide molecule. Subsequently, the nonlinear support vector machine (SVM) and Gaussian process (GP) as well as linear partial least squares (PLS) regression is employed to correlate the structural parameters of the characterizations with the IMS drift times of these peptides. The obtained quantitative structure-spectrum relationship (QSSR) models are evaluated rigorously and investigated systematically via both one-deep and two-deep cross-validations as well as the rigorous Monte Carlo cross-validation (MCCV). We also give a comprehensive comparison on the resulting statistics arising from the different combinations of variable types with modeling methods and find that the sequence-based approach can give the QSSR models with better fitting ability and predictive power but worse interpretability than the structure-based approach. In addition, though the QSSR modeling using sequence-based approach is not needed for the preparation of the minimization structures of peptides before the modeling, it would be considerably efficient as compared to that using structure-based approach.