FastLSA: A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence Alignment

  • Authors:
  • Adrian Driga;Paul Lu;Jonathan Schaeffer;Duane Szafron;Kevin Charter;Ian Parsons

  • Affiliations:
  • Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada;No current affiliation known, Canada;No current affiliation known, Canada

  • Venue:
  • Algorithmica
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sequence alignment is a fundamental operation for homology search in bioinformatics. For two DNA or protein sequences of length m and n, full-matrix (FM), dynamic programming alignment algorithms such as Needleman-Wunsch and Smith-Waterman take O(m × n) time and use a possibly prohibitive O(m × n) space. Hirschberg's algorithm reduces the space requirements to O(min(m, n)), but requires approximately twice the number of operations required by the FM algorithms. The Fast Linear-Space Alignment (FastLSA) algorithm adapts to the amount of space available by trading space for operations. FastLSA can effectively adapt to use either linear or quadratic space, depending on the specific machine. Our experiments show that, in practice, due to memory caching effects, FastLSA is always as fast or faster than the Hirschberg and FM algorithms. To improve the performance of FastLSA further, we have parallelized it using a simple but effective form of wavefront parallelism. Our experimental results show that Parallel FastLSA exhibits good speedups, almost linear for eight processors or less, and also that the efficiency of Parallel FastLSA increases with the size of the sequences that are aligned. Consequently, parallel and sequential FastLSA can be flexibly and effectively used with high performance in situations where space and the number of parallel processors can vary greatly.