A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Kernel conditional random fields: representation and clique selection
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Training conditional random fields via gradient tree boosting
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Fold Recognition by Predicted Alignment Accuracy
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A Tree-Decomposition Approach to Protein Structure Prediction
CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Protein homology detection by HMM--HMM comparison
Bioinformatics
Hi-index | 0.01 |
Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy. This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.