DNA Sequence Compression Using the Burrows-Wheeler Transform

  • Authors:
  • Don Adjeroh;Yong Zhang;Amar Mukherjee;Matt Powell;Tim Bell

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate off-line dictionary oriented approaches to DNA sequence compression, based on the Burrows-Wheeler Transform (BWT). The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here, we propose off-line methods to compress DNA sequences that exploit the different repetition structures inherent in such sequences. Repetition analysis is performed based on the relationship between the BWT andimportant pattern matching data structures, such as the suffix tree and suffix array. We discuss how the proposed approach can be incorporated in the BWT compression pipeline.