RCS—a system for version control
Software—Practice & Experience
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A fast algorithm for computing longest common subsequences
Communications of the ACM
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Compactly encoding unstructured inputs with differential compression
Journal of the ACM (JACM)
Concrete Math
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
PPMexe: PPM for Compressing Software
DCC '02 Proceedings of the Data Compression Conference
Handbook of Exact String Matching Algorithms
Handbook of Exact String Matching Algorithms
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
In-place rsync: file synchronization for mobile and wireless devices
ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Proofs from THE BOOK
The source code control system
IEEE Transactions on Software Engineering
Yuri, logic, and computer science
Fields of logic and computation
Suppressing redundancy in wireless sensor network traffic
DCOSS'10 Proceedings of the 6th IEEE international conference on Distributed Computing in Sensor Systems
Hi-index | 0.00 |
When a file is to be transmitted from a sender to a recipient and when the latter already has a file somewhat similar to it, remote differential compression seeks to determine the similarities interactively so as to transmit only the part of the new file not already in the recipient's old file. Content-dependent chunking means that the sender and recipient chop their files into chunks, with the cutpoints determined by some internal features of the files, so that when segments of the two files agree (possibly in different locations within the files) the cutpoints in such segments tend to be in corresponding locations, and so the chunks agree. By exchanging hash values of the chunks, the sender and recipient can determine which chunks of the new file are absent from the old one and thus need to be transmitted. We propose two new algorithms for content-dependent chunking, and we compare their behavior, on random files, with each other and with previously used algorithms. One of our algorithms, the local maximum chunking method, has been implemented and found to work better in practice than previously used algorithms. Theoretical comparisons between the various algorithms can be based on several criteria, most of which seek to formalize the idea that chunks should be neither too small (so that hashing and sending hash values become inefficient) nor too large (so that agreements of entire chunks become unlikely). We propose a new criterion, called the slack of a chunking method, which seeks to measure how much of an interval of agreement between two files is wasted because it lies in chunks that don't agree. Finally, we show how to efficiently find the cutpoints for local maximum chunking.