A comparison of scoring functions for protein sequence profile alignment

  • Authors:
  • Robert C. Edgar;Kimmen Sjölander

  • Affiliations:
  • 195 Roque Moraes Drive, Mill Valley, CA 94941, USA;Department of Bioengineering, University of California, Berkeley, CA 94720, USA

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation:In recent years, several methods have been proposed for aligning two protein sequence profiles, with reported improvements in alignment accuracy and homolog discrimination versus sequence--sequence methods (e.g. BLAST) and profile--sequence methods (e.g. PSI-BLAST). Profile--profile alignment is also the iterated step in progressive multiple sequence alignment algorithms such as CLUSTALW. However, little is known about the relative performance of different profile--profile scoring functions. In this work, we evaluate the alignment accuracy of 23 different profile--profile scoring functions by comparing alignments of 488 pairs of sequences with identity ≤30% against structural alignments. We optimize parameters for all scoring functions on the same training set and use profiles of alignments from both PSI-BLAST and SAM-T99. Structural alignments are constructed from a consensus between the FSSP database and CE structural aligner. We compare the results with sequence--sequence and sequence--profile methods, including BLAST and PSI-BLAST. Results: We find that profile--profile alignment gives an average improvement over our test set of typically 2--3% over profile--sequence alignment and ∼40% over sequence--sequence alignment. No statistically significant difference is seen in the relative performance of most of the scoring functions tested. Significantly better results are obtained with profiles constructed from SAM-T99 alignments than from PSI-BLAST alignments. Availability: Source code, reference alignments and more detailed results are freely available at http://phylogenomics.berkeley.edu/profilealignment/