Brief communication: SVM-BALSA: Remote homology detection based on Bayesian sequence alignment

  • Authors:
  • Bobbie-Jo Webb-Robertson;Christopher Oehmen;Melissa Matzke

  • Affiliations:
  • Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA;Computational Biology and Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA;Decision and Sensor Analytics, Pacific Northwest National Laboratory, Richland, WA 99352, USA

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Biopolymer sequence comparison to identify evolutionarily related proteins, or homologs, is one of the most common tasks in bioinformatics. Support vector machines (SVMs) represent a new approach to the problem in which statistical learning theory is employed to classify proteins into families, thus identifying homologous relationships. Current SVM approaches have been shown to outperform iterative profile methods, such as PSI-BLAST, for protein homology classification. In this study, we demonstrate that the utilization of a Bayesian alignment score, which accounts for the uncertainty of all possible alignments, in the SVM construction improves sensitivity compared to the traditional dynamic programming implementation over a benchmark dataset consisting of 54 unique protein families. The SVM-BALSA algorithms returns a higher area under the receiver operating characteristic (ROC) curves for 37 of the 54 families and achieves an improved overall performance curve at a significance level of 0.07.