Fast algorithms with preprocessing for matrix-vector multiplication problems
Journal of Complexity
Oblivious data structures: applications to cryptography
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Balls and bins: a study in negative dependence
Random Structures & Algorithms
Modern computer algebra
A small approximately min-wise independent family of hash functions
Journal of Algorithms
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Estimating Rarity and Similarity over Data Stream Windows
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Comparing Data Streams Using Hamming Norms (How to Zero In)
IEEE Transactions on Knowledge and Data Engineering
Sampling in dynamic data streams and applications
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
Detecting malicious network traffic using inverse distributions of packet contents
Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling
VLDB '05 Proceedings of the 31st international conference on Very large data bases
A dip in the reservoir: maintaining sample synopses of evolving datasets
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Random Sampling for Continuous Streams with Arbitrary Updates
IEEE Transactions on Knowledge and Data Engineering
Counting distinct items over update streams
Theoretical Computer Science
Maintaining bernoulli samples over evolving multisets
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Uniform Hashing in Constant Time and Optimal Space
SIAM Journal on Computing
Approximate sparse recovery: optimizing time and measurements
Proceedings of the forty-second ACM symposium on Theory of computing
An optimal algorithm for the distinct elements problem
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
1-pass relative-error Lp-sampling with applications
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Tight bounds for Lp samplers, finding duplicates in streams, and related problems
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Don't let the negatives bring you down: sampling from streams of signed updates
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Improved sketching of hamming distance with error correcting
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
We study the problem of generating a large sample from a data stream of elements (i,v), where the sample consists of pairs (i,Ci) for Ci=∑(i,v)∈streamv. We consider strict turnstile streams and general non-strict turnstile streams, in which Ci may be negative. Our sample is useful for approximating both forward and inverse distribution statistics, within an additive error ε and provable success probability 1−δ. Our sampling method improves by an order of magnitude the known processing time of each stream element, a crucial factor in data stream applications, thereby providing a feasible solution to the problem. For example, for a sample of size O(ε−2 log(1/δ)) in non-strict streams, our solution requires O((loglog(1/ε))2+(loglog(1/δ)) 2) operations per stream element, whereas the best previous solution requires O(ε−2 log2(1/δ)) evaluations of a fully independent hash function per element. We achieve this improvement by constructing an efficient K-elements recovery structure from which K elements can be extracted with probability 1−δ. Our structure enables our sampling algorithm to run on distributed systems and extract statistics on the difference between streams.