Combining segmental semi-Markov models with neural networks for protein secondary structure prediction

  • Authors:
  • Niranjan P. Bidargaddi;Madhu Chetty;Joarder Kamruzzaman

  • Affiliations:
  • Gippsland School of Information Technology, Monash University, VIC 3842, Australia;Gippsland School of Information Technology, Monash University, VIC 3842, Australia;Gippsland School of Information Technology, Monash University, VIC 3842, Australia

  • Venue:
  • Neurocomputing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Motivation: Predicting the secondary structure of proteins from a primary sequence alone has been variously approached from either a classification or a generative model perspective. The most prominent classification methods have used neural networks, which involves mappings from a local window of residues in the sequence to the structural state of the central residue in the window, thus capturing the local interactions effectively. However, they fail to capture distant interactions among residues. The generative models based on Bayesian segmentation capture sequence structure relationships using generalized hidden Markov models with explicit state duration. They capture non-local interactions through a joint sequence-structure probability distribution based on structural segments. In this paper, we investigate a combined architecture of Bayesian segmentation at the first stage and neural network at the second stage which captures both local and non-local correlation, to increase the single sequence prediction accuracy. The combined architecture is further enhanced by using neural network optimization and ensemble techniques. Results: The proposed architecture has been built and tested on two widely studied databases comprising 480 and 608 protein sequences, respectively. It achieved accuracies of above 71%, which is comparable to the highest accuracies reported so far for single sequence methods, without using the evolutionary information provided by multiple sequence alignments. The required data sets and program codes are available at http://www.gippsland.monash.edu.au/research/publish/neurocomputing.zip.