A Divide-and-Conquer Implementation of Three Sequence Alignment and Ancestor Inference

  • Authors:
  • Feng Yue;Jijun Tang

  • Affiliations:
  • -;-

  • Venue:
  • BIBM '07 Proceedings of the 2007 IEEE International Conference on Bioinformatics and Biomedicine
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present an algorithm to simultaneously align three biological sequences with affine gap model and infer their common ancestral sequence. Our algorithm can be further extended to perform tree alignment for more se- quences, and eventually unify the two procedures of phylo- genetic reconstruction and sequence alignment. The nov- elty of our algorithm is: it applies the divide-and-conquer strategy so that the memory usage is reduced from O (n3) to O (n2), while at the same time, it is based on dynamic programming and optimal alignment is guaranteed. Tra- ditionally, three sequence alignment is limited by the huge demand of memory space and can only handle sequences less than two hundred characters long. With the new im- proved algorithm, we can produce the optimal alignment of sequences of several thousand characters long. We implemented our algorithm as a C program package MSAM . It has been extensively tested with BAliBASE, a real manually refined multiple sequence alignment database, as well as simulated datasets generated by Rose (Ran- dom Model of Sequence Evolution). We compared our re- sults with those of other popular multiple sequence align- ment tools, including the widely used programs such as ClustalW and T-Coffee. The experiment shows that MSAM produces not only better alignment, but also better ancestral sequence. The software can be downloaded for free at http://www.cse.sc.edu/phylo/MSAM.html