Protein solvent accessibility prediction using support vector machines and sequence conservations

Authors:
Hasan Oğul;Erkan Ü. Mumcuoğlu
Affiliations:
Department of Computer Engineering, Başkent University, Ankara, Turkey;Information Systems and Health Informatics, Informatics Institute, Middle East Technical University, Ankara, Turkey
Venue:
TAINN'05 Proceedings of the 14th Turkish conference on Artificial Intelligence and Neural Networks
Year:
2005

Citing 4
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Combining pairwise sequence similarity and support vector machines for remote protein homology detection

Proceedings of the sixth annual international conference on Computational biology
Classification comparison of prediction of solvent accessibility from protein sequences

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29

Quantified Score

Hi-index	0.00

Visualization

Abstract

A two-stage method is developed for the single sequence prediction of protein solvent accessibility from solely its amino acid sequence. The first stage classifies each residue in a protein sequence as exposed or buried using support vector machine (SVM). The features used in the SVM are physico-chemical properties of the amino acid to be predicted as well as the information coming from its neighboring residues. The SVM-based predictions are refined using pairwise conservative patterns, called maximal unique matches (MUMs). The MUMs are identified by an efficient data structure called suffix tree. The baseline predictions, SVM-based predictions and MUM-based refinements are tested on a nonredundant protein data set and 7̃3% prediction accuracy is achieved for a solvent accessibility threshold that provides an evenly distribution between buried and exposed classes. The results demonstrate that the new method achieves slightly better accuracy than recent methods using single sequence prediction.