Brief announcement: truly parallel burrows-wheeler compression and decompression

Authors:
James Alexander Edwards;Uzi Vishkin
Affiliations:
University of Maryland, College Park, USA;University of Maryland, College Park, USA
Venue:
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Year:
2013

Citing 8
Cited 0

Deterministic coin tossing with applications to optimal parallel list ranking

Information and Control
Faster optimal parallel prefix sums and list ranking

Information and Computation
Symmetry breaking for suffix tree construction

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
On the Performance of BWT Sorting Algorithms

DCC '00 Proceedings of the Conference on Data Compression
Linear work suffix array construction

Journal of the ACM (JACM)
Parallel Lossless Data Compression Based on the Burrows-Wheeler Transform

AINA '07 Proceedings of the 21st International Conference on Advanced Networking and Applications
Finding Biconnected Componemts And Computing Tree Functions In Logarithmic Parallel Time

SFCS '84 Proceedings of the 25th Annual Symposium onFoundations of Computer Science, 1984
Linear Suffix Array Construction by Almost Pure Induced-Sorting

DCC '09 Proceedings of the 2009 Data Compression Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present novel work-optimal PRAM algorithms for Burrows-Wheeler (BW) compression and decompression of strings over a constant alphabet. For a string of length n, the depth of the compression algorithm is O(log2 n), and the depth of the corresponding decompression algorithm is O(log n). These appear to be the first polylogarithmic-time work-optimal parallel algorithms for any standard lossless compression scheme. The algorithms for the individual stages of compression and decompression may also be of independent interest: 1. a novel O(log n)-time, O(n)-work PRAM algorithm for Huffman decoding; 2. original insights into the stages of the BW compression and decompression problems, bringing out parallelism that was not readily apparent. We then mapped such parallelism in interesting ways to elementary parallel routines that have O(log n)-time, O(n)-work solutions, such as: (i) prefix-sums problems with an appropriately-defined associative binary operator for several stages, and (ii) list ranking for the final stage of decompression (inverse blocksorting transform). Companion work reports empirical speedups of up to 25x for compression and up to 13x for decompression. This reflects a speedup of 70x over recent work on BW compression on GPUs.