Empirical knowledge and genetic algorithms for selection of amide I frequencies in protein secondary structure prediction

Authors:
Joachim A. Hering;Peter R. Innocent;Parvez I. Haris
Affiliations:
De Montfort University, Leicester, UK;De Montfort University, Leicester, UK;De Montfort University, Leicester, UK
Venue:
APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Year:
2004

Citing 2
Cited 0

Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Here we investigate an extension of a previously suggested "automatic amide I frequency selection procedure" where we introduce an additional criterion utilizing empirical knowledge on regions within the amide I band (1600--1700 cm-1) found to be particularly sensitive to protein secondary structure. We show that the genetic algorithm provides a solution with good protein secondary structure prediction accuracy.Based on an evaluation set of 13 protein infrared spectra from proteins not contained in the reference set, it is demonstrated that our method is capable of making good predictions for proteins it has never seen before during training. In the present study, where the genetic algorithm is guided towards a solution with a higher number of empirically determined, structure sensitive amide I frequencies selected, minor improvement in prediction accuracy for α-helix and ß-sheet structure could be achieved compared to our previous study, where no such knowledge has been provided. Despite the very limited number of protein spectra in the reference set (18), the neural networks were able to generalize with an overall average of standard errors of prediction of 4.36 % based on the evaluation set of protein spectra, which is even better than that achieved during the analysis based on the reference set of protein spectra (4.8 %). This clearly indicates the potential of our approach once more protein infrared spectra are available to base the analysis on.