Phase-only filtering for the masses (of DNA Data): a new approach to sequence alignment

  • Authors:
  • A.K. Brodzik

  • Affiliations:
  • MITRE Corp., Bedford, MA, USA

  • Venue:
  • IEEE Transactions on Signal Processing - Part II
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Alignment of DNA segments containing repetitive nucleotide base patterns is an important task in several genomics applications, including DNA sequencing, DNA fingerprinting, pathogen detection, and gene finding. One of the most efficient procedures used for this task is the cross correlation method. The main computations of the procedure are the discrete Fourier transform (DFT) and a pointwise multiplication of two complex Fourier transform sequences. In this paper, the standard magnitude-and-phase cross correlation technique is compared with the lesser known but closely related phase-only cross correlation method. It is shown that for a periodic DNA sequence, the standard approach leads to significant sidelobes in the cross correlation, the magnitude of which increases with sequence length, while the phase-only approach yields a perfect cross correlation with zero sidelobes. For a DNA sequence that contains both irregularly distributed symbols and periodic patterns, the difference in performance is less pronounced, but still significant. Numerical experiments on synthesized and real data demonstrate that the phase-only approach is robust to isolated symbol insertions and deletions and that it is capable of identifying positions of matching segments in the sequence.