An exact parallel algorithm to compare very long biological sequences in clusters of workstations

  • Authors:
  • Azzedine Boukerche;Alba Cristina Melo;Edans Flavius Sandes;Mauricio Ayala-Rincon

  • Affiliations:
  • SITE, University of Ottawa, Ottawa, Canada;University of Brasilia, Brasilia, Brazil;University of Brasilia, Brasilia, Brazil;University of Brasilia, Brasilia, Brazil

  • Venue:
  • Cluster Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Biological Sequence Comparison is one of the most important operations in Computational Biology since it is used to determine how similar two sequences are. Smith and Waterman proposed an exact algorithm (SW), based on dynamic programming, that is able to obtain the best local alignment between two sequences in quadratic time and space. In order to compare long biological sequences, SW is rarely used since the computation time and the amount of memory required becomes prohibitive. For this reason, heuristic methods like BLAST are widely used. Although faster, these heuristic methods do not guarantee that the best result will be produced. In this paper, we propose an exact parallel variant of the SW algorithm that obtains the best local alignments in quadratic time and reduced space. The results obtained in two clusters (8-machine and 16-machine) for DNA sequences longer than 32 KBP (kilo base-pairs) were very close to linear and, in some cases, superlinear. For very long DNA sequences (1.6 MBP), we were able to reduce execution time from 12.25 hours to 1.54 hours, in our 8-machine cluster. As far as we know, this is the first time 1.6 MBP sequences are compared with an exact SW variant. In this case, 30240 best local alignments were obtained.