Modeling Delta Encoding of Compressed Files

  • Authors:
  • S. T. Klein;T. C. Serebro;D. Shapira

  • Affiliations:
  • Bar Ilan University, Israel;Bar Ilan University, Israel;Ashkelon Acad. Colleges, Israel

  • Venue:
  • DCC '06 Proceedings of the Data Compression Conference
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a new model of differencing encoding, that of Compressed Differencing. Given two files for which at least one is in compressed form, the goal is to create a third file which is the delta file of the two original files, in time proportional to the size of the input, that is, without decompressing the compressed files. If both files, S and T, are compressed using the same static Huffman code, generating the differencing file can be done in the traditional way (using a sliding window) directly on the compressed files. The delta encoding is at least as efficient as the delta encoding generated on the original files S and T. Common substrings of S and T are still common substrings of the compressed versions of S and T. However the reverse is not necessarily true, since the common substrings can extend beyond codeword boudaries.