Brief communication: SVM-BALSA: Remote homology detection based on Bayesian sequence alignment

Authors:
Bobbie-Jo Webb-Robertson;Christopher Oehmen;Melissa Matzke
Affiliations:
Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA;Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA;Decision and Sensor Analytics, Pacific Northwest National Laboratory, Richland, WA 99352, USA
Venue:
Computational Biology and Chemistry
Year:
2005

Citing 4
Cited 5

The nature of statistical learning theory

The nature of statistical learning theory
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach

Data Mining and Knowledge Discovery
Protein family classification and functional annotation

Computational Biology and Chemistry

Brief communication: Integrating subcellular location for improving machine learning models of remote homology detection in eukaryotic organisms

Computational Biology and Chemistry
Brief Communication: A feature vector integration approach for a generalized support vector machine pairwise homology algorithm

Computational Biology and Chemistry
Peptide programs: applying fragment programs to protein classification

Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
Artificial intelligence in genomic sequence, protein structure function prediction and DNA microarrays: a survey

International Journal of Computational Intelligence in Bioinformatics and Systems Biology
Remote homology detection incorporating the context of physicochemical properties

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biopolymer sequence comparison to identify evolutionarily related proteins, or homologs, is one of the most common tasks in bioinformatics. Support vector machines (SVMs) represent a new approach to the problem in which statistical learning theory is employed to classify proteins into families, thus identifying homologous relationships. Current SVM approaches have been shown to outperform iterative profile methods, such as PSI-BLAST, for protein homology classification. In this study, we demonstrate that the utilization of a Bayesian alignment score, which accounts for the uncertainty of all possible alignments, in the SVM construction improves sensitivity compared to the traditional dynamic programming implementation over a benchmark dataset consisting of 54 unique protein families. The SVM-BALSA algorithms returns a higher area under the receiver operating characteristic (ROC) curves for 37 of the 54 families and achieves an improved overall performance curve at a significance level of 0.07.