Corpus based learning of stochastic, context-free grammars combined with Hidden Markov Models for tRNA modelling

Authors:
Juan Miguel Garcia-Gomez;Jose Miguel Benedi;Javier Vicente;Montserrat Robles
Affiliations:
Informatica Medica-BET, Polytechnic University of Valencia, Camino de Vera s/n, Valencia, 46022, Spain.;Dpto. Sistemas Informaticos y Computacion, Polytechnic University of Valencia, 9, Camino de Vera s/n, Valencia, 46022, Spain.;Informatica Medica-BET, Polytechnic University of Valencia, Camino de Vera s/n, Valencia, 46022, Spain.;Informatica Medica-BET, Polytechnic University of Valencia, Camino de Vera s/n, Valencia, 46022, Spain
Venue:
International Journal of Bioinformatics Research and Applications
Year:
2005

Citing 8
Cited 0

Efficient learning of context-free grammars from positive structural examples

Information and Computation
An efficient probabilistic context-free parsing algorithm that computes prefix probabilities

Computational Linguistics
Bayesian learning of probabilistic language models

Bayesian learning of probabilistic language models
An efficient context-free parsing algorithm

Communications of the ACM
The theory of parsing, translation, and compiling

The theory of parsing, translation, and compiling
Combination of Estimation Algorithms and Grammatical Inference Techniques to Learn Stochastic Context-Free Grammars

ICGI '00 Proceedings of the 5th International Colloquium on Grammatical Inference: Algorithms and Applications
Tree-bank Grammars

Tree-bank Grammars
A hybrid language model based on a combination of N-grams and stochastic context-free grammars

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a new method for modelling tRNA secondary structures is presented. This method is based on the combination of stochastic context-free grammars (SCFG) and Hidden Markov Models (HMM). HMM are used to capture the local relations in the loops of the molecule (nonstructured regions) and SCFG are used to capture the long term relations between nucleotides of the arms (structured regions). Given annotated public databases, the HMM and SCFG models are learned by means of automatic inductive learning methods. Two SCFG learning methods have been explored. Both of them take advantage of the structural information associated with the training sequences: one of them is based on a stochastic version of the Sakakibara algorithm and the other one is based on a Corpus based algorithm. A final model is then obtained by merging of the HMM of the nonstructured regions and the SCFG of the structured regions. Finally, the performed experiments on the tRNA sequence corpus and the non-tRNA sequence corpus give significant results. Comparative experiments with another published method are also presented.