Predicting Protein Secondary Structure Using Stochastic Tree Grammars

Authors:
Naoki Abe;Hiroshi Mamitsuka
Affiliations:
Theory NEC Laboratory, Real World Computing Partnership C & C Media Research Laboratories, NEC Corporation, 4-1-1 Miyazaki, Miyamae-ku, Kawasaki, 216 Japan. E-mail: abe@ccm.cl.nec.co.jp, mam ...;Theory NEC Laboratory, Real World Computing Partnership C & C Media Research Laboratories, NEC Corporation, 4-1-1 Miyazaki, Miyamae-ku, Kawasaki, 216 Japan. E-mail: abe@ccm.cl.nec.co.jp, mam ...
Venue:
Machine Learning - Special issue on learning with probabilistic representations
Year:
1997

Citing 8
Cited 18

A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
The computational linguistics of biological sequences

Artificial intelligence and molecular biology
Inducing Probabilistic Grammars by Bayesian Model Merging

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Context-free grammars on trees

STOC '69 Proceedings of the first annual ACM symposium on Theory of computing
Some computational properties of Tree Adjoining Grammars

ACL '85 Proceedings of the 23rd annual meeting on Association for Computational Linguistics
Feasible learnability of formal grammars and the theory of natural language acquisition

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Stochastic lexicalized tree-adjoining grammars

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Introduction to probabilistic automata (Computer science and applied mathematics)

Introduction to probabilistic automata (Computer science and applied mathematics)

Guest Editors‘ Introduction: Machine Learning and Natural Language

Machine Learning - Special issue on natural language learning
Generalized Stochastic Tree Automata for Multi-relational Data Mining

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
A Probabilistic Model for Mining Labeled Ordered Trees: Capturing Patterns in Carbohydrate Sugar Chains

IEEE Transactions on Knowledge and Data Engineering
Probabilistic Finite-State Machines-Part II

IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic Finite-State Machines-Part I

IEEE Transactions on Pattern Analysis and Machine Intelligence
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology

ACM Transactions on Knowledge Discovery from Data (TKDD)
Learning Rational Stochastic Tree Languages

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Prediction of Protein Beta-Sheets: Dynamic Programming versus Grammatical Approach

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
On the Generative Power of Multiple Context-Free Grammars and Macro Grammars

IEICE - Transactions on Information and Systems
Information extraction from web documents based on local unranked tree automaton inference

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Learning context-free grammar using improved tabular representation

Applied Soft Computing
A bibliographical study of grammatical inference

Pattern Recognition
Annotated stochastic context free grammars for analysis and synthesis of proteins

EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Graph transformation in molecular biology

Formal Methods in Software and Systems Modeling
Stochastic context-free graph grammars for glycoprotein modelling

CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata
Ten open problems in grammatical inference

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences

Quantified Score

Hi-index	0.01

Visualization

Abstract

We propose a new method for predicting protein secondary structure of a given amino acid sequence, based on a training algorithm for the probability parameters of a stochastic tree grammar. In particular, we concentrate on the problem of predicting β-sheet regions, which has previously been considered difficult because of the unbounded dependencies exhibited by sequences corresponding to β-sheets. To cope with this difficulty, we use a new family of stochastic tree grammars, which we call Stochastic Ranked Node Rewriting Grammars, which are powerful enough to capture the type of dependencies exhibited by the sequences of β-sheet regions, such as the ’parallel‘ and ’anti-parallel‘ dependencies and their combinations. The training algorithm we use is an extension of the ’inside-outside‘ algorithm for stochastic context-free grammars, but with a number of significant modifications. We applied our method on real data obtained from the HSSP database (Homology-derived Secondary Structure of Proteins Ver 1.0) and the results were encouraging: Our method was able to predict roughly 75 percent of the β-strands correctly in a systematic evaluation experiment, in which the test sequences not only have less than 25 percent identity to the training sequences, but are totally unrelated to them. This figure compares favorably to the predictive accuracy of the state-of-the-art prediction methods in the field, even though our experiment was on a restricted type of β-sheet structures and the test was done on a relatively small data size. We also stress that our method can predict the structure as well as the location of β-sheet regions, which was not possible by conventional methods for secondary structure prediction. Extended abstracts of parts of the work presented in this paper have appeared in (Abe & Mamitsuka, 1994) and (Mamitsuka & Abe, 1994).