Fast parallel algorithms for universal lossless source coding

  • Authors:
  • Yoram Bresler;Dror Baron

  • Affiliations:
  • -;-

  • Venue:
  • Fast parallel algorithms for universal lossless source coding
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

Most data compression research in recent years has focused on lossy compression for audio, images, and video, but lossless source coding is still important for compressing text files, executables, financial and medical data, etc. When the statistics of the source are unknown, a universal method that estimates a model for the source must be used. This dissertation focuses on fast algorithms for universal lossless source coding. We first identify inherent redundancies in previous uses of the Burrows Wheeler transform (BWT), an invertible permutation transform that has been suggested for lossless compression, and offer several improvements to the previous state of the art in BWT-based compression and semipredictive encoding. These improvements yield an O(N) nonsequential semipredictive encoder whose redundancy with respect to any (unbounded depth) tree source is O(1) bits per state above Rissanen's redundancy bound. We then develop parallel algorithms for universal lossless coding. We first bound the redundancy of two-part codes for independent and identically distributed sequences, and show how two-part codes can be used for distributed compression. We then describe our parallel compression algorithm, which is the main contribution of the dissertation. We partition the length- N input into B blocks, accumulate statistical information on all B blocks in parallel, estimate the single minimum description length (MDL) source underlying all B blocks, and encode the blocks in parallel. We provide an O( N/B) complexity parallel algorithm that compresses almost as well as the best serial algorithms. Our last contribution is a new suffix lists data structure that leads to several efficient algorithms for implementing the BWT. The distinguishing feature of our algorithms is that they are simple enough to be implemented in hardware. The O(N/B) parallel compression algorithm estimates the MDL source among all tree sources whose maximal depth is log( N/B). This algorithm can be extended to parallel algorithms that support unbounded context depths. This will provide low redundancy performance over a much broader class of sources, and may lead to new applications that until now were limited by the throughput bottleneck of serial compression algorithms.