Monotone circuits for connectivity require super-logarithmic depth
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Pseudorandom generators for space-bounded computations
STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
On data structures and asymmetric communication complexity
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Sampling from a moving window over streaming data
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
The Communication Complexity of the Universal Relation
CCC '97 Proceedings of the 12th Annual IEEE Conference on Computational Complexity
Finding frequent items in data streams
Theoretical Computer Science - Special issue on automata, languages and programming
Duplicate detection in click streams
WWW '05 Proceedings of the 14th international conference on World Wide Web
Sampling in dynamic data streams and applications
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Priority sampling for estimation of arbitrary subset sums
Journal of the ACM (JACM)
Finding duplicates in a data stream
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Stream sampling for variance-optimal estimation of subset sums
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-optimal heavy hitters with strong error bounds
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The Data Stream Space Complexity of Cascaded Norms
FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Finding a duplicate and a missing item in a stream
TAMC'07 Proceedings of the 4th international conference on Theory and applications of models of computation
An optimal algorithm for the distinct elements problem
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal sampling from distributed streams
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
1-pass relative-error Lp-sampling with applications
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
On the exact space complexity of sketching and streaming small norms
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Lower bounds for sparse recovery
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Analyzing graph structure via linear measurements
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Graph sketches: sparsification, spanners, and subgraphs
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Don't let the negatives bring you down: sampling from streams of signed updates
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
On the streaming complexity of computing local clustering coefficients
Proceedings of the sixth ACM international conference on Web search and data mining
Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error
ACM Transactions on Algorithms (TALG) - Special Issue on SODA'11
Homomorphic fingerprints under misalignments: sketching edit and shift distances
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Tight lower bound for linear sketches of moments
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Efficient sampling of non-strict turnstile data streams
FCT'13 Proceedings of the 19th international conference on Fundamentals of Computation Theory
Hi-index | 0.00 |
In this paper, we present near-optimal space bounds for Lp-samplers. Given a stream of updates (additions and subtraction) to the coordinates of an underlying vector x in Rn, a perfect Lp sampler outputs the i-th coordinate with probability xipxpp. In SODA 2010, Monemizadeh and Woodruff showed polylog space upper bounds for approximate Lp-samplers and demonstrated various applications of them. Very recently, Andoni, Krauthgamer and Onak improved the upper bounds and gave a O(ε-plog3n) space ε relative error and constant failure rate Lp-sampler for p є [1,2]. In this work, we give another such algorithm requiring only O(ε-plog2n) space for p є (1,2). For p є (0,1), our space bound is O(ε-1log2n), while for the p=1 case we have an O(log(1/ε)ε-log2n) space algorithm. We also give a O(log2n) bits zero relative error L0-sampler, improving the O(log3n) bits algorithm due to Frahling, Indyk and Sohler. As an application of our samplers, we give better upper bounds for the problem of finding duplicates in data streams. In case the length of the stream is longer than the alphabet size, L1 sampling gives us an O(log2n) space algorithm, thus improving the previous O(log3n) bound due to Gopalan and Radhakrishnan. In the second part of our work, we prove an Ω (log2n) lower bound for sampling from 0, ± 1 vectors (in this special case, the parameter p is not relevant for Lp sampling). This matches the space of our sampling algorithms for constant ε0. We also prove tight space lower bounds for the finding duplicates and heavy hitters problems. We obtain these lower bounds using reductions from the communication complexity problem augmented indexing.