Expanders, randomness, or time versus space
Journal of Computer and System Sciences - Structure in Complexity Theory Conference, June 2-5, 1986
Small-bias probability spaces: efficient constructions and applications
SIAM Journal on Computing
Amortized Communication Complexity
SIAM Journal on Computing
Public vs. private coin flips in one round communication games (extended abstract)
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Property testing and its connection to learning and approximation
Journal of the ACM (JACM)
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Extracting randomness: a survey and new constructions
Journal of Computer and System Sciences
Synopsis data structures for massive data sets
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
On randomized one-round communication complexity
Computational Complexity
External memory algorithms
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Approximate clustering via core-sets
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Robust Characterizations of Polynomials withApplications to Program Testing
SIAM Journal on Computing
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
SIAM Journal on Computing
An Approximate L1-Difference Algorithm for Massive Data Streams
SIAM Journal on Computing
Incremental Cryptography: The Case of Hashing and Signing
CRYPTO '94 Proceedings of the 14th Annual International Cryptology Conference on Advances in Cryptology
Handbook of massive data sets
Randomized Simultaneous Messages: Solution Of A Problem Of Yao In Communication Complexity
CCC '97 Proceedings of the 12th Annual IEEE Conference on Computational Complexity
Some complexity questions related to distributive computing(Preliminary Report)
STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
Approximating Edit Distance Efficiently
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
The complexity of massive data set computations
The complexity of massive data set computations
Secure multiparty computation of approximations
ACM Transactions on Algorithms (TALG)
One sketch for all: fast algorithms for compressed sensing
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Explicit constructions for compressed sensing of sparse signals
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
A new paradigm for collision-free hashing: incrementality at reduced cost
EUROCRYPT'97 Proceedings of the 16th annual international conference on Theory and application of cryptographic techniques
Secure hash-and-sign signatures without the random oracle
EUROCRYPT'99 Proceedings of the 17th international conference on Theory and application of cryptographic techniques
Combinatorial algorithms for compressed sensing
SIROCCO'06 Proceedings of the 13th international conference on Structural Information and Communication Complexity
Polylogarithmic private approximations and efficient matching
TCC'06 Proceedings of the Third conference on Theory of Cryptography
IEEE Transactions on Information Theory
Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?
IEEE Transactions on Information Theory
Deterministic history-independent strategies for storing information on write-once memories
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Path-quality monitoring in the presence of adversaries
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
How robust are linear sketches to adaptive inputs?
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Hi-index | 0.00 |
We formalize a realistic model for computations over massive data sets. The model, referred to as the {\em adversarial sketch model}, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of protocols in "hostile environments", and provides a framework for studying the complexity of many tasks involving massive data sets. The adversarial sketch model consists of several participating parties: honest parties, whose goal is to compute a pre-determined function of their inputs, and an adversarial party. Computation in this model proceeds in two phases. In the first phase, the adversarial party chooses the inputs of the honest parties. These inputs are sets of elements taken from a large universe, and provided to the honest parties in an on-line manner in the form of a sequence of insert and delete operations. Once an operation from the sequence has been processed it is discarded and cannot be retrieved unless explicitly stored. During this phase the honest parties are not allowed to communicate. Moreover, they do not share any secret information and any public information they share is known to the adversary in advance. In the second phase, the honest parties engage in a protocol in order to compute a pre-determined function of their inputs. In this paper we settle the complexity (up to logarithmic factors) of two fundamental problems in this model: testing whether two massive data sets are equal, and approximating the size of their symmetric difference. We construct explicit and efficient protocols with sublinear sketches of essentially optimal size, poly-logarithmic update time during the first phase, and poly-logarithmic communication and computation during the second phase. Our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties: incrementality and high distance, which may be of independent interest.