A fast and simple randomized parallel algorithm for the maximal independent set problem
Journal of Algorithms
Approximating the number of zeroes of a GF[2] polynomial
Journal of Algorithms
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Reductions in streaming algorithms, with an application to counting triangles in graphs
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
An Approximate L1-Difference Algorithm for Massive Data Streams
SIAM Journal on Computing
One-Pass Wavelet Decompositions of Data Streams
IEEE Transactions on Knowledge and Data Engineering
Processing set expressions over continuous update streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Pairwise Independence and Derandomization
Pairwise Independence and Derandomization
Gossip-Based Computation of Aggregate Information
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Sketch-based change detection: methods, evaluation, and applications
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Tabulation based 4-universal hashing with applications to second moment estimation
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Finding frequent items in data streams
Theoretical Computer Science - Special issue on automata, languages and programming
Approximation techniques for spatial data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Range-Efficient Computation of F" over Massive Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Improved range-summable random variable construction algorithms
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Domain-Driven Data Synopses for Dynamic Quantiles
IEEE Transactions on Knowledge and Data Engineering
Sketches for size of join estimation
ACM Transactions on Database Systems (TODS)
Small synopses for group-by query verification on outsourced data streams
ACM Transactions on Database Systems (TODS)
Beyond simple aggregates: indexing for summary queries
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On sketch based anonymization that satisfies differential privacy model
AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
The exact computation of aggregate queries, like the size of join of two relations, usually requires large amounts of memory (constrained in data-streaming) or communication (constrained in distributed computation) and large processing times. In this situation, approximation techniques with provable guarantees, like sketches, are one possible solution. The performance of sketches depends crucially on the ability to generate particular pseudo-random numbers. In this article we investigate both theoretically and empirically the problem of generating k-wise independent pseudo-random numbers and, in particular, that of generating 3- and 4-wise independent pseudo-random numbers that are fast range-summable (i.e., they can be summed in sublinear time). Our specific contributions are: (a) we provide a thorough comparison of the various pseudo-random number generating schemes; (b) we study both theoretically and empirically the fast range-summation property of 3- and 4-wise independent generating schemes; (c) we provide algorithms for the fast range-summation of two 3-wise independent schemes, BCH and extended Hamming; and (d) we show convincing theoretical and empirical evidence that the extended Hamming scheme performs as well as any 4-wise independent scheme for estimating the size of join of two relations using AMS sketches, even though it is only 3-wise independent. We use this scheme to generate estimators that significantly outperform state-of-the-art solutions for two problems, namely, size of spatial joins and selectivity estimation.