Journal of Computer Science and Technology
Proximal regularization for online and batch learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A majorization-minimization algorithm for (multiple) hyperparameter learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Lifting Prediction to Alignment of RNA Pseudoknots
RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Simultaneous Alignment and Folding of Protein Sequences
RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
A novel method for MicroRNA secondary structure prediction using a bottom-up algorithm
Proceedings of the 47th Annual Southeast Regional Conference
Fast RNA Structure Alignment for Crossing Input Structures
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Automatic parameter learning for multiple network alignment
RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
A non-parametric Bayesian approach for predicting RNA secondary structures
WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
Reducing the worst case running times of a family of RNA and CFG problems, using Valiant's approach
WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Sparse RNA folding: Time and space efficient algorithms
Journal of Discrete Algorithms
Rich parameterization improves RNA structure prediction
RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
A combinatorial framework for designing (pseudoknotted) RNA algorithms
WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
Designing RNA secondary structures in coding regions
ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications
Lynx: a programmatic SAT solver for the RNA-folding problem
SAT'12 Proceedings of the 15th international conference on Theory and Applications of Satisfiability Testing
International Journal of Data Mining and Bioinformatics
A parallel strategy for predicting the secondary structure of polycistronic microRNAs
International Journal of Bioinformatics Research and Applications
RNA secondary structure prediction using conditional random fields model
International Journal of Data Mining and Bioinformatics
An Algorithmic Game-Theory Approach for Coarse-Grain Prediction of RNA 3D Structure
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
2D Meets 4G: G-Quadruplexes in RNA Secondary Structure Prediction
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 3.84 |
Motivation: For several decades, free energy minimization methods have been the dominant strategy for single sequence RNA secondary structure prediction. More recently, stochastic context-free grammars (SCFGs) have emerged as an alternative probabilistic methodology for modeling RNA structure. Unlike physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, SCFGs use fully-automated statistical learning algorithms to derive model parameters. Despite this advantage, however, probabilistic methods have not replaced free energy minimization methods as the tool of choice for secondary structure prediction, as the accuracies of the best current SCFGs have yet to match those of the best physics-based models. Results: In this paper, we present CONTRAfold, a novel secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring. In a series of cross-validation experiments, we show that grammar-based secondary structure prediction methods formulated as CLLMs consistently outperform their SCFG analogs. Furthermore, CONTRAfold, a CLLM incorporating most of the features found in typical thermodynamic models, achieves the highest single sequence prediction accuracies to date, outperforming currently available probabilistic and physics-based techniques. Our result thus closes the gap between probabilistic and thermodynamic models, demonstrating that statistical learning procedures provide an effective alternative to empirical measurement of thermodynamic parameters for RNA secondary structure prediction. Availability: Source code for CONTRAfold is available at. Contact: chuongdo@cs.stanford.edu