Boosting Protein Threading Accuracy

Authors:
Jian Peng;Jinbo Xu
Affiliations:
Toyota Technological Institute at Chicago, Chicago, USA 60637;Toyota Technological Institute at Chicago, Chicago, USA 60637
Venue:
RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Year:
2009

Citing 11
Cited 0

A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Kernel conditional random fields: representation and clique selection

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Training conditional random fields via gradient tree boosting

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Fold Recognition by Predicted Alignment Accuracy

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A Tree-Decomposition Approach to Protein Structure Prediction

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Protein homology detection by HMM--HMM comparison

Bioinformatics
Fold recognition by combining profile--profile alignment and support vector machine

Bioinformatics
Calibrating E-values for hidden Markov models using reverse-sequence null models

Bioinformatics
A machine learning information retrieval approach to protein fold recognition

Bioinformatics

Quantified Score

Hi-index	0.01

Visualization

Abstract

Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy. This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.