What's the difference?: efficient set reconciliation without prior context

Authors:
David Eppstein;Michael T. Goodrich;Frank Uyeda;George Varghese
Affiliations:
University of California at Irvine, Irvine, CA, USA;University of California at Irvine, Irvine, CA, USA;University of California at San Diego, San Diego, CA, USA;University of California at San Diego, San Diego, CA, USA
Venue:
Proceedings of the ACM SIGCOMM 2011 conference
Year:
2011

Citing 19
Cited 3

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Randomized algorithms

Randomized algorithms
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A digital fountain approach to reliable distribution of bulk data

Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
Min-wise independent permutations

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
An Approximate L1-Difference Algorithm for Massive Data Streams

SIAM Journal on Computing
Informed content delivery across adaptive overlay networks

Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling

VLDB '05 Proceedings of the 31st international conference on Very large data bases
What's new: finding significant differences in network data streams

IEEE/ACM Transactions on Networking (TON)
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Redundancy elimination within large collections of files

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Reversible sketches: enabling monitoring and analysis over high-speed data streams

IEEE/ACM Transactions on Networking (TON)
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Straggler Identification in Round-Trip Data Streams via Newton's Identities and Invertible Bloom Filters

IEEE Transactions on Knowledge and Data Engineering
Data verification and reconciliation with generalized error-control codes

IEEE Transactions on Information Theory
Set reconciliation with nearly optimal communication complexity

IEEE Transactions on Information Theory

Synchronizing state with strong similarity between local and remote systems

Proceedings of the third ACM workshop on Mobile cloud computing and services
Performance measurement of the CCNx synchronization protocol

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Improving the performance of Invertible Bloom Lookup Tables

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a synopsis structure, the Difference Digest, that allows two nodes to compute the elements belonging to the set difference in a single round with communication overhead proportional to the size of the difference times the logarithm of the keyspace. While set reconciliation can be done efficiently using logs, logs require overhead for every update and scale poorly when multiple users are to be reconciled. By contrast, our abstraction assumes no prior context and is useful in networking and distributed systems applications such as trading blocks in a peer-to-peer network, and synchronizing link-state databases after a partition. Our basic set-reconciliation method has a similarity with the peeling algorithm used in Tornado codes [6], which is not surprising, as there is an intimate connection between set difference and coding. Beyond set reconciliation, an essential component in our Difference Digest is a new estimator for the size of the set difference that outperforms min-wise sketches [3] for small set differences. Our experiments show that the Difference Digest is more efficient than prior approaches such as Approximate Reconciliation Trees [5] and Characteristic Polynomial Interpolation [17]. We use Difference Digests to implement a generic KeyDiff service in Linux that runs over TCP and returns the sets of keys that differ between machines.