Tight bounds for Lp samplers, finding duplicates in streams, and related problems

Authors:
Hossein Jowhari;Mert Sağlam;Gábor Tardos
Affiliations:
Simon Fraser University, Burnaby, BC, Canada;Simon Fraser University, Burnaby, BC, Canada;Rényi Institute of Mathematics & Simon Fraser University, Budapest, Hungary
Venue:
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2011

Citing 22
Cited 9

Monotone circuits for connectivity require super-logarithmic depth

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Pseudorandom generators for space-bounded computations

STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
On data structures and asymmetric communication complexity

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Sampling from a moving window over streaming data

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
The Communication Complexity of the Universal Relation

CCC '97 Proceedings of the 12th Annual IEEE Conference on Computational Complexity
Finding frequent items in data streams

Theoretical Computer Science - Special issue on automata, languages and programming
Duplicate detection in click streams

WWW '05 Proceedings of the 14th international conference on World Wide Web
Sampling in dynamic data streams and applications

SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Priority sampling for estimation of arbitrary subset sums

Journal of the ACM (JACM)
Finding duplicates in a data stream

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Stream sampling for variance-optimal estimation of subset sums

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Optimal sampling from sliding windows

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-optimal heavy hitters with strong error bounds

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The Data Stream Space Complexity of Cascaded Norms

FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Finding a duplicate and a missing item in a stream

TAMC'07 Proceedings of the 4th international conference on Theory and applications of models of computation
An optimal algorithm for the distinct elements problem

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal sampling from distributed streams

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
1-pass relative-error Lp-sampling with applications

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
On the exact space complexity of sketching and streaming small norms

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Lower bounds for sparse recovery

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms

Analyzing graph structure via linear measurements

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Graph sketches: sparsification, spanners, and subgraphs

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Don't let the negatives bring you down: sampling from streams of signed updates

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
On the streaming complexity of computing local clustering coefficients

Proceedings of the sixth ACM international conference on Web search and data mining
Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error

ACM Transactions on Algorithms (TALG) - Special Issue on SODA'11
Homomorphic fingerprints under misalignments: sketching edit and shift distances

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Tight lower bound for linear sketches of moments

ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Efficient sampling of non-strict turnstile data streams

FCT'13 Proceedings of the 19th international conference on Fundamentals of Computation Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present near-optimal space bounds for Lp-samplers. Given a stream of updates (additions and subtraction) to the coordinates of an underlying vector x in Rn, a perfect Lp sampler outputs the i-th coordinate with probability xipxpp. In SODA 2010, Monemizadeh and Woodruff showed polylog space upper bounds for approximate Lp-samplers and demonstrated various applications of them. Very recently, Andoni, Krauthgamer and Onak improved the upper bounds and gave a O(ε-plog3n) space ε relative error and constant failure rate Lp-sampler for p є [1,2]. In this work, we give another such algorithm requiring only O(ε-plog2n) space for p є (1,2). For p є (0,1), our space bound is O(ε-1log2n), while for the p=1 case we have an O(log(1/ε)ε-log2n) space algorithm. We also give a O(log2n) bits zero relative error L0-sampler, improving the O(log3n) bits algorithm due to Frahling, Indyk and Sohler. As an application of our samplers, we give better upper bounds for the problem of finding duplicates in data streams. In case the length of the stream is longer than the alphabet size, L1 sampling gives us an O(log2n) space algorithm, thus improving the previous O(log3n) bound due to Gopalan and Radhakrishnan. In the second part of our work, we prove an Ω (log2n) lower bound for sampling from 0, ± 1 vectors (in this special case, the parameter p is not relevant for Lp sampling). This matches the space of our sampling algorithms for constant ε0. We also prove tight space lower bounds for the finding duplicates and heavy hitters problems. We obtain these lower bounds using reductions from the communication complexity problem augmented indexing.