A sub-quadratic sequence alignment algorithm for unrestricted cost matrices

  • Authors:
  • Maxime Crochemore;Gad M. Landau;Michal Ziv-Ukelson

  • Affiliations:
  • Institut Gaspard-Monge, Universit de Marne-la-Vallée, Cit Descartes, Champs-sur-Marne, Marne-la-Vallée Cedex 2, France;Haifa University, Haifa, Israel and Polytechnic University, Six MetroTech Center, Brooklyn, NY;Haifa University, Haifa, Israel and IBM T.J.W. Research Center

  • Venue:
  • SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The classical algorithm for computing the similarity between two sequences [36, 39] uses a dynamic programming matrix, and compares two strings of size n in O(n2) time. We address the challenge of computing the similarity of two strings in sub-quadratic time, for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both local and global alignment computations.The speed-up is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by Lempel-Ziv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an O(n2/log n) algorithm for an input of constant alphabet size. For most texts, the time complexity is actually O(hn2/log n) where h ≤ 1 is the entropy of the text.