Self-adjusting binary search trees
Journal of the ACM (JACM)
A locally adaptive data compression scheme
Communications of the ACM
Software—Practice & Experience
Introduction to algorithms
Adding compression to a full-text retrieval system
Software—Practice & Experience
ACM Transactions on Information Systems (TOIS)
Fast searching on compressed text allowing errors
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Communications of the ACM
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Text Compression for Dynamic Document Databases
IEEE Transactions on Knowledge and Data Engineering
Universal Lossless Source Coding with the Burrows Wheeler Transform
DCC '99 Proceedings of the Conference on Data Compression
Modifications of the Burrows and Wheeler Data Compression Algorithm
DCC '99 Proceedings of the Conference on Data Compression
Move-to-Front and Inversion Coding
DCC '00 Proceedings of the Conference on Data Compression
On the Performance of BWT Sorting Algorithms
DCC '00 Proceedings of the Conference on Data Compression
PPM Performance with BWT Complexity: A New Method for Lossless Data Compression
DCC '00 Proceedings of the Conference on Data Compression
A Fast Algorithms for Making Suffix Arrays and for Burrows-Wheeler Transformation
DCC '98 Proceedings of the Conference on Data Compression
Enhanced word-based block-sorting text compression
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Parsing Strategies for BWT Compression
DCC '01 Proceedings of the Data Compression Conference
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
Block-sorting is an innovative compression mechanism introduced in 1994 by Burrows and Wheeler. It involves three steps: permuting the input one block at a time through the use of the Burrows-Wheeler Transform (BWT); applying a Move-To-Front (MTF) transform to each of the permuted blocks; and then entropy coding the output with a Huffman or arithmetic coder. Until now, block-sorting implementations have assumed that the input message is a sequence of characters. In this paper we extend the block-sorting mechanism to word-based models. We also consider other transformations as an alternative to MTF, and are able to show improved compression results compared to MTF. For large files of text, the combination of word-based modelling, BWT, and MTF-like transformations allows excellent compression effectiveness to be attained within reasonable resource costs.