An approximation to the greedy algorithm for differential compression

Authors:
R. C. Agarwal;K. Gupta;S. Jain;S. Amalapurapu
Affiliations:
-;-;-;-
Venue:
IBM Journal of Research and Development - Spintronics
Year:
2006

Citing 10
Cited 2

Delta storage for arbitrary non-text files

SCM '91 Proceedings of the 3rd international workshop on Software configuration management
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Efficient distributed backup with delta compression

Proceedings of the fifth workshop on I/O in parallel and distributed systems
Delta algorithms: an empirical analysis

ACM Transactions on Software Engineering and Methodology (TOSEM)
The String-to-String Correction Problem

Journal of the ACM (JACM)
The string-to-string correction problem with block moves

ACM Transactions on Computer Systems (TOCS)
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Compactly encoding unstructured inputs with differential compression

Journal of the ACM (JACM)
In-Place Differential File Compression

DCC '03 Proceedings of the Conference on Data Compression
Efficient randomized pattern-matching algorithms

IBM Journal of Research and Development - Mathematics and computing

Decoding Code on a Sensor Node

DCOSS '08 Proceedings of the 4th IEEE international conference on Distributed Computing in Sensor Systems
Collection-based compression using discovered long matching strings

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new differential compression algorithm that combines the hash value techniques and suffix array techniques of previous work. The term "differential compression" refers to encoding a file (a version file) as a set of changes with respect to another file (a reference file). Previous differential compression algorithms can be shown empirically to run in linear time, but they have certain drawbacks; namely, they do not find the best matches for every offset of the version file. Our algorithm, hsadelta (hash suffix array delta), finds the best matches for every offset of the version file, with respect to a certain granularity and above a certain length threshold. The algorithm has two variations depending on how we choose the block size. We show that if the block size is kept fixed, the compression performance of the algorithm is similar to that of the greedy algorithm, without the associated expensive space and time requirements. If the block size is varied linearly with the reference file size, the algorithm can run in linear time and constant space. We also show empirically that the algorithm performs better than other state-of-the-art differential compression algorithms in terms of compression and is comparable in speed.