An approach to phrase selection for offline data compression

Authors:
A. Turpin;W. F. Smyth
Affiliations:
Curtin University of Technology, Perth, Western Australia, 6845;Curtin University of Technology, Perth, Western Australia, 6845
Venue:
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Year:
2002

Citing 5
Cited 3

Data compression via textual substitution

Journal of the ACM (JACM)
Experiments in text file compression

Communications of the ACM
General-purpose compression for efficient retrieval

Journal of the American Society for Information Science and Technology
Data Compression Using Long Common Strings

DCC '99 Proceedings of the Conference on Data Compression
A Mathematical Theory of Communication

A Mathematical Theory of Communication

Computing quasi suffix arrays

Journal of Automata, Languages and Combinatorics - Special issue: Selected papers of the 13th Australasian workshop on combinatorial algorithms
PPM with the extended alphabet

Information Sciences: an International Journal
Improving semistatic compression via phrase-based modeling

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently several offline data compression schemes have been published that expend large amounts of computing resources when encoding a file, but decode the file quickly. These compressors work by identifying phrases in the input data, and storing the data as a series of pointer to these phrases. This paper explores the application of an algorithm for computing all repeating substrings within a string for phrase selection in an offline data compressor. Using our approach, we obtain compression similar to that of the best known offline compressors on genetic data, but poor results on general text. It seems, however, that an alternate approach based on selecting repeating substrings is feasible.