The computational linguistics of biological sequences
Artificial intelligence and molecular biology
Inducing Probabilistic Grammars by Bayesian Model Merging
ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Context-free grammars on trees
STOC '69 Proceedings of the first annual ACM symposium on Theory of computing
Some computational properties of Tree Adjoining Grammars
ACL '85 Proceedings of the 23rd annual meeting on Association for Computational Linguistics
Feasible learnability of formal grammars and the theory of natural language acquisition
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Stochastic lexicalized tree-adjoining grammars
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Introduction to probabilistic automata (Computer science and applied mathematics)
Introduction to probabilistic automata (Computer science and applied mathematics)
Guest Editors‘ Introduction: Machine Learning and Natural Language
Machine Learning - Special issue on natural language learning
Generalized Stochastic Tree Automata for Multi-relational Data Mining
ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
IEEE Transactions on Knowledge and Data Engineering
Probabilistic Finite-State Machines-Part II
IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic Finite-State Machines-Part I
IEEE Transactions on Pattern Analysis and Machine Intelligence
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology
ACM Transactions on Knowledge Discovery from Data (TKDD)
Learning Rational Stochastic Tree Languages
ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Prediction of Protein Beta-Sheets: Dynamic Programming versus Grammatical Approach
PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
On the Generative Power of Multiple Context-Free Grammars and Macro Grammars
IEICE - Transactions on Information and Systems
Information extraction from web documents based on local unranked tree automaton inference
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Learning context-free grammar using improved tabular representation
Applied Soft Computing
A bibliographical study of grammatical inference
Pattern Recognition
Annotated stochastic context free grammars for analysis and synthesis of proteins
EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Graph transformation in molecular biology
Formal Methods in Software and Systems Modeling
Stochastic context-free graph grammars for glycoprotein modelling
CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata
Ten open problems in grammatical inference
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Hi-index | 0.01 |
We propose a new method for predicting protein secondary structure of a given amino acid sequence, based on a training algorithm for the probability parameters of a stochastic tree grammar. In particular, we concentrate on the problem of predicting β-sheet regions, which has previously been considered difficult because of the unbounded dependencies exhibited by sequences corresponding to β-sheets. To cope with this difficulty, we use a new family of stochastic tree grammars, which we call Stochastic Ranked Node Rewriting Grammars, which are powerful enough to capture the type of dependencies exhibited by the sequences of β-sheet regions, such as the ’parallel‘ and ’anti-parallel‘ dependencies and their combinations. The training algorithm we use is an extension of the ’inside-outside‘ algorithm for stochastic context-free grammars, but with a number of significant modifications. We applied our method on real data obtained from the HSSP database (Homology-derived Secondary Structure of Proteins Ver 1.0) and the results were encouraging: Our method was able to predict roughly 75 percent of the β-strands correctly in a systematic evaluation experiment, in which the test sequences not only have less than 25 percent identity to the training sequences, but are totally unrelated to them. This figure compares favorably to the predictive accuracy of the state-of-the-art prediction methods in the field, even though our experiment was on a restricted type of β-sheet structures and the test was done on a relatively small data size. We also stress that our method can predict the structure as well as the location of β-sheet regions, which was not possible by conventional methods for secondary structure prediction. Extended abstracts of parts of the work presented in this paper have appeared in (Abe & Mamitsuka, 1994) and (Mamitsuka & Abe, 1994).