Fold recognition by combining profile--profile alignment and support vector machine

Authors:
Sangjo Han;Byung-Chul Lee;Seung Taek Yu;Chan-Seok Jeong;Soyoung Lee;Dongsup Kim
Affiliations:
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.;The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.;Department of Biosystems, Korea Advanced Institute of Science and Technology Daejeon, 305-701, Korea;Department of Biosystems, Korea Advanced Institute of Science and Technology Daejeon, 305-701, Korea;Department of Biosystems, Korea Advanced Institute of Science and Technology Daejeon, 305-701, Korea;Department of Biosystems, Korea Advanced Institute of Science and Technology Daejeon, 305-701, Korea
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 6

Sequence-based protein structure prediction using a reduced state-space hidden Markov model

Computers in Biology and Medicine
Boosting Protein Threading Accuracy

RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Margin-based ensemble classifier for protein fold recognition

Expert Systems with Applications: An International Journal
Conotoxin protein classification using pairwise comparison and amino acid composition: toxin-aam

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Remote protein homology detection and fold recognition using two-layer support vector machine classifiers

Computers in Biology and Medicine
Comparison of one-class SVM and two-class SVM for fold recognition

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Currently, the most accurate fold-recognition method is to perform profile--profile alignments and estimate the statistical significances of those alignments by calculating Z-score or E-value. Although this scheme is reliable in recognizing relatively close homologs related at the family level, it has difficulty in finding the remote homologs that are related at the superfamily or fold level. Results: In this paper, we present an alternative method to estimate the significance of the alignments. The alignment between a query protein and a template of length n in the fold library is transformed into a feature vector of length n + 1, which is then evaluated by support vector machine (SVM). The output from SVM is converted to a posterior probability that a query sequence is related to a template, given SVM output. Results show that a new method shows significantly better performance than PSI-BLAST and profile--profile alignment with Z-score scheme. While PSI-BLAST and Z-score scheme detect 16 and 20% of superfamily-related proteins, respectively, at 90% specificity, a new method detects 46% of these proteins, resulting in more than 2-fold increase in sensitivity. More significantly, at the fold level, a new method can detect 14% of remotely related proteins at 90% specificity, a remarkable result considering the fact that the other methods can detect almost none at the same level of specificity. Contact: kds@kaist.ac.kr