Modifications of the Burrows and Wheeler Data Compression Algorithm

  • Authors:
  • Bernhard Balkenhol;Stefan Kurtz;Yuri M. Shtarkov

  • Affiliations:
  • -;-;-

  • Venue:
  • DCC '99 Proceedings of the Conference on Data Compression
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

In 1994 Burrows and Wheeler [3] described a universal data compression algorithm (BW-algorithm, for short) which achieved compression rates that were close to the best known compression rates. Due to it's simplicity, the algorithm can be implemented with relatively low complexity. Fenwick [5] described ideas to improve the efficiency (i.e. the compression rate) and complexity of the BW-algorithm. He also discusses relationships of the algorithm with other compression methods. Schindler [12] proposed a Burrows and Wheeler Transformation (BWT, for short) that is based on a limited ordering. This speeds up the algorithm for compression, but slows it down for decompression and slightly decreases the efficiency. Larsson [8] describes relationship of the BWT with suffix trees and with context trees. Sadakane [11] suggests a method to compute the BWT faster, and compares it to other methods. Recently Balkenhol and Kurtz [1] gave a thorough analysis of the BWT from an information theoretic point of view. They described implementation techniques for data compression algorithms based on the BWT, and developed a program with a better compression rate.In this paper we improve upon these previous results on the BW-algorithm. Based on the context tree model, we consider the speci_c statistical properties of the data at the output of the BWT. We describe six important properties, three of which have not been described elsewhere. These considerations lead to modifications of the coding method, which in turn improve the coding efficiency. We shortly describe how to compute the BWT with low complexity in time and space, using suffix trees in two different representations. Finally, we present experimental results about the compression rate and running time of our method, and compare these results to previous achievements. More references on the methods described in this paper can be found in [1, 5].