Rich parameterization improves RNA structure prediction

Authors:
Shay Zakov;Yoav Goldberg;Michael Elhadad;Michal Ziv-Ukelson
Affiliations:
Department of Computer Science, Ben-Gurion University of the Negev, Israel;Department of Computer Science, Ben-Gurion University of the Negev, Israel;Department of Computer Science, Ben-Gurion University of the Negev, Israel;Department of Computer Science, Ben-Gurion University of the Negev, Israel
Venue:
RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
Year:
2011

Citing 14
Cited 1

Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Conserved RNA secondary structures in viral genomes: a survey

Bioinformatics
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
CONTRAfold

Bioinformatics
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
Efficient parameter estimation for RNA secondary structure prediction

Bioinformatics
A Faster Algorithm for RNA Co-folding

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Sparse RNA Folding: Time and Space Efficient Algorithms

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
VARNA

Bioinformatics
11,001 new features for statistical machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A structured model for joint learning of argument roles and predicate senses

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Sparsification of RNA structure prediction including pseudoknots

WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Time and space efficient RNA-RNA interaction prediction via sparse folding

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology

Designing RNA secondary structures in coding regions

ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Motivation. Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, to machinelearning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant and all models to date have relatively few parameters. We propose a move to much richer parameterizations. Contribution. We study the potential contribution of increasing the amount of information utilized by folding prediction models to the improvement of their prediction quality. This is achieved by proposing novel models, which refine previous ones by examining more types of structural elements, and larger sequential contexts for these elements. We argue that with suitable learning techniques, not being tied to features whose weights could be determined experimentally, and having a large enough set of examples, one could define much richer feature representations than was previously explored, while still allowing efficient inference. Our proposed fine-grained models are made practical thanks to the availability of large training sets, advances in machine-learning, and recent accelerations to RNA folding algorithms. Results. In order to test our assumption, we conducted a set of experiments that asses the prediction quality of the proposed models. These experiments reproduce the settings that were applied in recent thorough work that compared prediction qualities of several state-of-the-art RNA folding prediction algorithms. We show that the application of more detailed models indeed improves prediction quality, while the corresponding running time of the folding algorithm remains fast. An additional important outcome of this experiment is a new RNA folding prediction model (coupled with a freely available implementation), which results in a significantly higher prediction quality than that of previous models. This final model has about 70,000 free parameters, several orders of magnitude more than previous models. Being trained and tested over the same comprehensive data sets, our model achieves a score of 84% according to the F1-measure over correctly-predicted base-pairs (i.e. 16% error rate), compared to the previously best reported score of 70% (i.e. 30% error rate). That is, the new model yields an error reduction of about 50%.