A fast and simple randomized parallel algorithm for the maximal independent set problem
Journal of Algorithms
On construction of k-wise independent random variables
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Reductions in streaming algorithms, with an application to counting triangles in graphs
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
An Approximate L1-Difference Algorithm for Massive Data Streams
SIAM Journal on Computing
One-Pass Wavelet Decompositions of Data Streams
IEEE Transactions on Knowledge and Data Engineering
Processing set expressions over continuous update streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Pairwise Independence and Derandomization
Pairwise Independence and Derandomization
The Computational Complexity of ({\it XOR, AND\/})-Counting Problems
The Computational Complexity of ({\'it XOR, AND\'/})-Counting Problems
Approximating the Number of Solutions of a {\ G F [ 2 ]} Polynomial
Approximating the Number of Solutions of a {\' G F [ 2 ]} Polynomial
Gossip-Based Computation of Aggregate Information
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Approximation techniques for spatial data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Range-Efficient Computation of F" over Massive Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Improved range-summable random variable construction algorithms
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Domain-Driven Data Synopses for Dynamic Quantiles
IEEE Transactions on Knowledge and Data Engineering
Statistical analysis of sketch estimators
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Coordinated weighted sampling for estimating aggregates over multiple weight assignments
Proceedings of the VLDB Endowment
Rectangle-efficient aggregation in spatial data streams
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Hi-index | 0.00 |
Exact computation for aggregate queries usually requires large amounts of memory - constrained in data-streaming - or communication - constrained in distributed computation - and large processing times. In this situation, approximation techniques with provable guarantees, like sketches, are the only viable solution. The performance of sketches crucially depends on the ability to efficiently generate particular pseudo-random numbers. In this paper we investigate both theoretically and empirically the problem of generating k-wise independent pseudo-random numbers and, in particular, that of generating 3 and 4-wise independent pseudo-random numbers that are fast range-summable (i.e., they can be summed up in sub-linear time). Our specific contributions are: (a) we provide an empirical comparison of the various pseudo-random number generating schemes, (b) we study both theoretically and empirically the fast range-summation practicality for the 3 and 4-wise independent generating schemes and we provide efficient implementations for the 3-wise independent schemes, (c) we show convincing theoretical and empirical evidence that the extended Hamming scheme performs as well as any 4-wise independent scheme for estimating the size of join using AMS-sketches, even though it is only 3-wise independent. We use this generating scheme to produce estimators that significantly out-perform the state-of-the-art solutions for two problems - size of spatial joins and selectivity estimation.