Sketching in adversarial environments

Authors:
Ilya Mironov;Moni Naor;Gil Segev
Affiliations:
Microsoft Research, Silicon Valley Campus, Mountain View, CA, USA;Weizmann Institute of Science, Rehovot, Israel;Weizmann Institute of Science, Rehovot, Israel
Venue:
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Year:
2008

Citing 36
Cited 2

Expanders, randomness, or time versus space

Journal of Computer and System Sciences - Structure in Complexity Theory Conference, June 2-5, 1986
Small-bias probability spaces: efficient constructions and applications

SIAM Journal on Computing
Amortized Communication Complexity

SIAM Journal on Computing
Public vs. private coin flips in one round communication games (extended abstract)

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Property testing and its connection to learning and approximation

Journal of the ACM (JACM)
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Extracting randomness: a survey and new constructions

Journal of Computer and System Sciences
Synopsis data structures for massive data sets

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
On randomized one-round communication complexity

Computational Complexity
Computing on data streams

External memory algorithms
Min-wise independent permutations

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Robust Characterizations of Polynomials withApplications to Program Testing

SIAM Journal on Computing
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces

SIAM Journal on Computing
An Approximate L1-Difference Algorithm for Massive Data Streams

SIAM Journal on Computing
Incremental Cryptography: The Case of Hashing and Signing

CRYPTO '94 Proceedings of the 14th Annual International Cryptology Conference on Advances in Cryptology
Handbook of massive data sets

Handbook of massive data sets
Randomized Simultaneous Messages: Solution Of A Problem Of Yao In Communication Complexity

CCC '97 Proceedings of the 12th Annual IEEE Conference on Computational Complexity
Some complexity questions related to distributive computing(Preliminary Report)

STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
Approximating Edit Distance Efficiently

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
The complexity of massive data set computations

The complexity of massive data set computations
Secure multiparty computation of approximations

ACM Transactions on Algorithms (TALG)
One sketch for all: fast algorithms for compressed sensing

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Lossless Condensers, Unbalanced Expanders, And Extractors

Combinatorica
Explicit constructions for compressed sensing of sparse signals

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
A new paradigm for collision-free hashing: incrementality at reduced cost

EUROCRYPT'97 Proceedings of the 16th annual international conference on Theory and application of cryptographic techniques
Secure hash-and-sign signatures without the random oracle

EUROCRYPT'99 Proceedings of the 17th international conference on Theory and application of cryptographic techniques
Combinatorial algorithms for compressed sensing

SIROCCO'06 Proceedings of the 13th international conference on Structural Information and Communication Complexity
Polylogarithmic private approximations and efficient matching

TCC'06 Proceedings of the Third conference on Theory of Cryptography
Compressed sensing

IEEE Transactions on Information Theory
Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?

IEEE Transactions on Information Theory
Deterministic history-independent strategies for storing information on write-once memories

ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming

Path-quality monitoring in the presence of adversaries

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
How robust are linear sketches to adaptive inputs?

Proceedings of the forty-fifth annual ACM symposium on Theory of computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We formalize a realistic model for computations over massive data sets. The model, referred to as the {\em adversarial sketch model}, unifies the well-studied sketch and data stream models together with a cryptographic flavor that considers the execution of protocols in "hostile environments", and provides a framework for studying the complexity of many tasks involving massive data sets. The adversarial sketch model consists of several participating parties: honest parties, whose goal is to compute a pre-determined function of their inputs, and an adversarial party. Computation in this model proceeds in two phases. In the first phase, the adversarial party chooses the inputs of the honest parties. These inputs are sets of elements taken from a large universe, and provided to the honest parties in an on-line manner in the form of a sequence of insert and delete operations. Once an operation from the sequence has been processed it is discarded and cannot be retrieved unless explicitly stored. During this phase the honest parties are not allowed to communicate. Moreover, they do not share any secret information and any public information they share is known to the adversary in advance. In the second phase, the honest parties engage in a protocol in order to compute a pre-determined function of their inputs. In this paper we settle the complexity (up to logarithmic factors) of two fundamental problems in this model: testing whether two massive data sets are equal, and approximating the size of their symmetric difference. We construct explicit and efficient protocols with sublinear sketches of essentially optimal size, poly-logarithmic update time during the first phase, and poly-logarithmic communication and computation during the second phase. Our main technical contribution is an explicit and deterministic encoding scheme that enjoys two seemingly conflicting properties: incrementality and high distance, which may be of independent interest.