Predicting Protein Secondary Structure Using Stochastic Tree Grammars

  • Authors:
  • Naoki Abe;Hiroshi Mamitsuka

  • Affiliations:
  • Theory NEC Laboratory, Real World Computing Partnership C & C Media Research Laboratories, NEC Corporation, 4-1-1 Miyazaki, Miyamae-ku, Kawasaki, 216 Japan. E-mail: abe@ccm.cl.nec.co.jp, mam ...;Theory NEC Laboratory, Real World Computing Partnership C & C Media Research Laboratories, NEC Corporation, 4-1-1 Miyazaki, Miyamae-ku, Kawasaki, 216 Japan. E-mail: abe@ccm.cl.nec.co.jp, mam ...

  • Venue:
  • Machine Learning - Special issue on learning with probabilistic representations
  • Year:
  • 1997

Quantified Score

Hi-index 0.01

Visualization

Abstract

We propose a new method for predicting protein secondary structure of a given amino acid sequence, based on a training algorithm for the probability parameters of a stochastic tree grammar. In particular, we concentrate on the problem of predicting β-sheet regions, which has previously been considered difficult because of the unbounded dependencies exhibited by sequences corresponding to β-sheets. To cope with this difficulty, we use a new family of stochastic tree grammars, which we call Stochastic Ranked Node Rewriting Grammars, which are powerful enough to capture the type of dependencies exhibited by the sequences of β-sheet regions, such as the ’parallel‘ and ’anti-parallel‘ dependencies and their combinations. The training algorithm we use is an extension of the ’inside-outside‘ algorithm for stochastic context-free grammars, but with a number of significant modifications. We applied our method on real data obtained from the HSSP database (Homology-derived Secondary Structure of Proteins Ver 1.0) and the results were encouraging: Our method was able to predict roughly 75 percent of the β-strands correctly in a systematic evaluation experiment, in which the test sequences not only have less than 25 percent identity to the training sequences, but are totally unrelated to them. This figure compares favorably to the predictive accuracy of the state-of-the-art prediction methods in the field, even though our experiment was on a restricted type of β-sheet structures and the test was done on a relatively small data size. We also stress that our method can predict the structure as well as the location of β-sheet regions, which was not possible by conventional methods for secondary structure prediction. Extended abstracts of parts of the work presented in this paper have appeared in (Abe & Mamitsuka, 1994) and (Mamitsuka & Abe, 1994).