RCS—a system for version control
Software—Practice & Experience
Delta storage for arbitrary non-text files
SCM '91 Proceedings of the 3rd international workshop on Software configuration management
PGP source code and internals
Meaningful change detection in structured data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Potential benefits of delta encoding and data compression for HTTP
SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
Efficient distributed backup with delta compression
Proceedings of the fifth workshop on I/O in parallel and distributed systems
In-place reconstruction of delta compressed files
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Delta algorithms: an empirical analysis
ACM Transactions on Software Engineering and Methodology (TOSEM)
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Determinism versus non-determinism for linear time RAMs (extended abstract)
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
The String-to-String Correction Problem
Journal of the ACM (JACM)
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Reducing the space requirement of suffix trees
Software—Practice & Experience
The string-to-string correction problem with block moves
ACM Transactions on Computer Systems (TOCS)
Information Theory and Reliable Communication
Information Theory and Reliable Communication
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Optimistic deltas for WWW latency reduction
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Pastiche: making backup cheap and easy
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Algebraic Signatures for Scalable Distributed Data Structures
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Deep Store: An Archival Storage System Architecture
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Pastiche: making backup cheap and easy
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Hierarchical substring caching for efficient content distribution to low-bandwidth clients
WWW '05 Proceedings of the 14th international conference on World Wide Web
An adaptive, fast, and safe XML parser based on byte sequences memorization
WWW '05 Proceedings of the 14th international conference on World Wide Web
An approximation to the greedy algorithm for differential compression
IBM Journal of Research and Development - Spintronics
Improving duplicate elimination in storage systems
ACM Transactions on Storage (TOS)
In-place rsync: file synchronization for mobile and wireless devices
ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Sparse indexing: large scale, inline deduplication using sampling and locality
FAST '09 Proccedings of the 7th conference on File and storage technologies
R-ADMAD: high reliability provision for large-scale de-duplication archival storage systems
Proceedings of the 23rd international conference on Supercomputing
Content-dependent chunking for differential compression, the local maximum approach
Journal of Computer and System Sciences
PRESIDIO: A Framework for Efficient Archival Data Storage
ACM Transactions on Storage (TOS)
Collection-based compression using discovered long matching strings
Proceedings of the 20th ACM international conference on Information and knowledge management
A two-phase differential synchronization algorithm for remote files
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Information Processing and Management: an International Journal
Hi-index | 0.01 |
The subject of this article is differential compression, the algorithmic task of finding common strings between versions of data and using them to encode one version compactly by describing it as a set of changes from its companion. A main goal of this work is to present new differencing algorithms that (i) operate at a fine granularity (the atomic unit of change), (ii) make no assumptions about the format or alignment of input data, and (iii) in practice use linear time, use constant space, and give good compression. We present new algorithms, which do not always compress optimally but use considerably less time or space than existing algorithms. One new algorithm runs in O(n) time and O(1) space in the worst case (where each unit of space contains [log n] bits), as compared to algorithms that run in O(n) time and O(n) space or in O(n2) time and O(1) space. We introduce two new techniques for differential compression and apply these to give additional algorithms that improve compression and time performance. We experimentally explore the properties of our algorithms by running them on actual versioned data. Finally, we present theoretical results that limit the compression power of differencing algorithms that are restricted to making only a single pass over the data.