Grammar precompression speeds up burrows---wheeler compression

Authors:
Juha Kärkkäinen;Pekka Mikkola;Dominik Kempa
Affiliations:
Department of Computer Science, University of Helsinki, Finland;Department of Computer Science, University of Helsinki, Finland;Department of Computer Science, University of Helsinki, Finland
Venue:
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Year:
2012

Citing 10
Cited 0

A text compression scheme that allows fast searching directly in the compressed file

ACM Transactions on Information Systems (TOIS)
General-purpose compression for efficient retrieval

Journal of the American Society for Information Science and Technology
Data compression with long repeated strings

Information Sciences: an International Journal - Dictionary based compression
Revisiting dictionary-based compression: Research Articles

Software—Practice & Experience
The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching

The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching
On compressing the textual web

Proceedings of the third ACM international conference on Web search and data mining
Post BWT stages of the Burrows–Wheeler compression algorithm

Software—Practice & Experience
Word-based self-indexes for natural language text

ACM Transactions on Information Systems (TOIS)
Slashing the Time for BWT Inversion

DCC '12 Proceedings of the 2012 Data Compression Conference
The smallest grammar problem

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text compression algorithms based on the Burrows---Wheeler transform (BWT) typically achieve a good compression ratio but are slow compared to Lempel---Ziv type compression algorithms. The main culprit is the time needed to compute the BWT during compression and its inverse during decompression. We propose to speed up BWT-based compression by performing a grammar-based precompression before the transform. The idea is to reduce the amount of data that BWT and its inverse have to process. We have developed a very fast grammar precompressor using pair replacement. Experiments show a substantial speed up in practice without a significant effect on compression ratio.