The Effect of Flexible Parsing for Dynamic Dictionary-Based Data Compression

  • Authors:
  • Yossi Matias;Nasir Rajpoot;Cenk Sahinalp

  • Affiliations:
  • Tel Aviv University;Warwick University;Case Western Reserve University

  • Venue:
  • Journal of Experimental Algorithmics (JEA)
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We report on the performance evaluation of greedy parsing with asingle step lookahead (which we call flexible Parsing or FPas an alternative to the commonly used greedy parsing (withno-lookaheads) scheme. Greedy parsing is the basis of most popularcompression programs including UNIX compress andgzip, however it usually results in far from optimalparsing/compression with regard to the dictionary constructionscheme in use. Flexible parsing, however, is optimal [MS99], i.e.partitions any given input to the smallest number of phrasespossible, for dictionary construction schemes which satisfy theprefix property throughout their execution.We focus on the application of FP in the context of theLZW variant of the Lempel-Ziv'78 dictionary construction method[Wel84, ZL78], which is of considerable practical interest. Weimplement two compression algorithms which use (1) FP withLZW dictionary (LZW-FP), and (2) FP with analternative flexible dictionary (FPA as introduced in [Hor95]). Ourimplementations are based on novel on-line data structures enablingus to use linear time and space. We test our implementations on acollection of input sequences which includes textual files, DNAsequences, medical images, and pseudorandom binary files, andcompare our results with two of the most popular compressionprograms UNIX compress and gzip. Our resultsdemonstrate that flexible parsing is especially useful fornon-textual data, on which it improves over the compression ratesof compress and gzip by up to 20% and 35%,respectively.