Pseudorandom generators for space-bounded computations
STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
Self-tuning histograms: building histograms without looking at data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Synopsis data structures for massive data sets
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Optimal histograms for hierarchical range queries (extended abstract)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal and approximate computation of summary statistics for range aggregates
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Fast algorithms for hierarchical range histogram construction
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Histogramming Data Streams with Fast Per-Item Processing
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Stable distributions, pseudorandom generators, embeddings and data stream computation
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Fast range query estimation by N-level tree histograms
Data & Knowledge Engineering
Improved range-summable random variable construction algorithms
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
A study on workload-aware wavelet synopses for point and range-sum queries
DOLAP '06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
Inner-product based wavelet synopses for range-sum queries
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Hierarchical synopses with optimal error guarantees
ACM Transactions on Database Systems (TODS)
Enhancing histograms by tree-like bucket indices
The VLDB Journal — The International Journal on Very Large Data Bases
Wavelet synopsis for hierarchical range queries with workloads
The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases
Multiplicative synopses for relative-error metrics
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Adaptively detecting aggregation bursts in data streams
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Hi-index | 0.00 |
A rangesum query to an array A is a pair (l, r) of range endpoints, which should be answered by Σl≤irA[i]. To compress A, we consider representing an array A lossily by a histogram, a function that is constant on each of a small number of buckets. We then answer range queries from H instead of from A, i.e., as Σl≤irH[i]. An optimal rangesum histogram H for this purpose is one whose bucket boundaries and constant heights within buckets are chosen to minimize the expected square error, El, r[(Σl≤irA[i]--Σl≤irH[i].)2], assuming each rangesum query is equally likely. Rangesum histograms find many applications in database systems.In a degenerate variation, all rangesum queries are over ranges of size one, namely, individual points; histograms optimal for this special case are called pointwise optimal histograms. Pointwise optimal histogram is a classical notion in statistics and approximation theory, but rangesum optimal histogram appears to be novel in these areas. While optimal pointwise histograms can be constructed efficiently by simple dynamic progrmming, no efficient (even approximate) general rangesmn histogram construction algorithms were previously known. In practice, all commercial database systems use heuristically built histograms for pointwise and rangesum queries.We present the first general algorithms for approximate rangesum histograms. Given parameter B, we denote by (α, β)-approximation an algorithm to produce a (αB)-bucket histogram with error at most β times the error of the optimal B-bucket histogram. We give a (2, 1)-approximation with runtime O(N2B), a (2, 1+∊)-approximation with runtime N + (B log(N)/∊)O(1) (1), and a (1, 1 + ∊)-approximation with runtime O(B3N4/∊2). We also consider the problem of dynamic maintenance of rangesum histograms for data updated by additive changes, and we give a (2, 1 + ∊)-approximation that uses space (Blog(N)/∊)O(1) and time (Blog(N)/∊)O(1) for update and query operations. The bounds are nearly competitive with some of the best known bounds for constructing pointwise optimal histograms modulo small additional number of buckets used; however, rangesum histograms are substantially harder to construct because of the long range dependence between subproblems.