Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Randomized algorithms
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A digital fountain approach to reliable distribution of bulk data
Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
An Approximate L1-Difference Algorithm for Massive Data Streams
SIAM Journal on Computing
Informed content delivery across adaptive overlay networks
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
What's new: finding significant differences in network data streams
IEEE/ACM Transactions on Networking (TON)
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Reversible sketches: enabling monitoring and analysis over high-speed data streams
IEEE/ACM Transactions on Networking (TON)
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
IEEE Transactions on Knowledge and Data Engineering
Data verification and reconciliation with generalized error-control codes
IEEE Transactions on Information Theory
Set reconciliation with nearly optimal communication complexity
IEEE Transactions on Information Theory
Synchronizing state with strong similarity between local and remote systems
Proceedings of the third ACM workshop on Mobile cloud computing and services
Performance measurement of the CCNx synchronization protocol
ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Improving the performance of Invertible Bloom Lookup Tables
Information Processing Letters
Hi-index | 0.00 |
We describe a synopsis structure, the Difference Digest, that allows two nodes to compute the elements belonging to the set difference in a single round with communication overhead proportional to the size of the difference times the logarithm of the keyspace. While set reconciliation can be done efficiently using logs, logs require overhead for every update and scale poorly when multiple users are to be reconciled. By contrast, our abstraction assumes no prior context and is useful in networking and distributed systems applications such as trading blocks in a peer-to-peer network, and synchronizing link-state databases after a partition. Our basic set-reconciliation method has a similarity with the peeling algorithm used in Tornado codes [6], which is not surprising, as there is an intimate connection between set difference and coding. Beyond set reconciliation, an essential component in our Difference Digest is a new estimator for the size of the set difference that outperforms min-wise sketches [3] for small set differences. Our experiments show that the Difference Digest is more efficient than prior approaches such as Approximate Reconciliation Trees [5] and Characteristic Polynomial Interpolation [17]. We use Difference Digests to implement a generic KeyDiff service in Linux that runs over TCP and returns the sets of keys that differ between machines.