Word-based block-sorting text compression

Authors:
R. Yugo Kartono Isal;Alistair Moffat
Affiliations:
The University of Melbourne, Victoria 3010, Australia;The University of Melbourne, Victoria 3010, Australia
Venue:
ACSC '01 Proceedings of the 24th Australasian conference on Computer science
Year:
2001

Citing 16
Cited 3

Self-adjusting binary search trees

Journal of the ACM (JACM)
A locally adaptive data compression scheme

Communications of the ACM
Word-based text compression

Software—Practice & Experience
Introduction to algorithms

Introduction to algorithms
Adding compression to a full-text retrieval system

Software—Practice & Experience
Arithmetic coding revisited

ACM Transactions on Information Systems (TOIS)
Fast searching on compressed text allowing errors

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Technical correspondence

Communications of the ACM
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Text Compression for Dynamic Document Databases

IEEE Transactions on Knowledge and Data Engineering
Universal Lossless Source Coding with the Burrows Wheeler Transform

DCC '99 Proceedings of the Conference on Data Compression
Modifications of the Burrows and Wheeler Data Compression Algorithm

DCC '99 Proceedings of the Conference on Data Compression
Move-to-Front and Inversion Coding

DCC '00 Proceedings of the Conference on Data Compression
On the Performance of BWT Sorting Algorithms

DCC '00 Proceedings of the Conference on Data Compression
PPM Performance with BWT Complexity: A New Method for Lossless Data Compression

DCC '00 Proceedings of the Conference on Data Compression
A Fast Algorithms for Making Suffix Arrays and for Burrows-Wheeler Transformation

DCC '98 Proceedings of the Conference on Data Compression

Enhanced word-based block-sorting text compression

ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Parsing Strategies for BWT Compression

DCC '01 Proceedings of the Data Compression Conference
Suffix arrays on words

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

Block-sorting is an innovative compression mechanism introduced in 1994 by Burrows and Wheeler. It involves three steps: permuting the input one block at a time through the use of the Burrows-Wheeler Transform (BWT); applying a Move-To-Front (MTF) transform to each of the permuted blocks; and then entropy coding the output with a Huffman or arithmetic coder. Until now, block-sorting implementations have assumed that the input message is a sequence of characters. In this paper we extend the block-sorting mechanism to word-based models. We also consider other transformations as an alternative to MTF, and are able to show improved compression results compared to MTF. For large files of text, the combination of word-based modelling, BWT, and MTF-like transformations allows excellent compression effectiveness to be attained within reasonable resource costs.