Optimal and approximate computation of summary statistics for range aggregates

Authors:
Anna C. Gilbert;Yannis Kotidis;S. Muthukrishnan;Marin J. Strauss
Affiliations:
AT&T Labs--Research, Florham Park, NJ;AT&T Labs--Research, Florham Park, NJ;AT&T Labs--Research, Florham Park, NJ;AT&T Labs--Research, Florham Park, NJ
Venue:
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2001

Citing 14
Cited 27

The visual display of quantitative information

The visual display of quantitative information
Statistical profile estimation in database systems

ACM Computing Surveys (CSUR)
Ten lectures on wavelets

Ten lectures on wavelets
Adapted wavelet analysis from theory to software

Adapted wavelet analysis from theory to software
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Optimal histograms for hierarchical range queries (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Towards estimation error guarantees for distinct values

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Universality of Serial Histograms

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases

How to evaluate multiple range-sum queries progressively

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast algorithms for hierarchical range histogram construction

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Rangesum histograms

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
ProPolyne: A Fast Wavelet-Based Algorithm for Progressive Evaluation of Polynomial Range-Sum Queries

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes

Proceedings of the 27th International Conference on Very Large Data Bases
One-Pass Wavelet Decompositions of Data Streams

IEEE Transactions on Knowledge and Data Engineering
Fast range query estimation by N-level tree histograms

Data & Knowledge Engineering
Spatiotemporal Aggregate Computation: A Survey

IEEE Transactions on Knowledge and Data Engineering
Approximation algorithms for wavelet transform coding of data streams

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Approximation and streaming algorithms for histogram construction problems

ACM Transactions on Database Systems (TODS)
Inner-product based wavelet synopses for range-sum queries

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
REHIST: relative error histogram construction algorithms

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A privacy-preserving index for range queries

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Histograms based on the minimum description length principle

The VLDB Journal — The International Journal on Very Large Data Bases
Hierarchical synopses with optimal error guarantees

ACM Transactions on Database Systems (TODS)
Wavelet synopsis for hierarchical range queries with workloads

The VLDB Journal — The International Journal on Very Large Data Bases
Exploiting Spatio-temporal Correlations for Data Processing in Sensor Networks

GeoSensor Networks
Plot Query Processing with Wavelets

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
On the space---time of optimal, approximate and streaming algorithms for synopsis construction problems

The VLDB Journal — The International Journal on Very Large Data Bases
Multiplicative synopses for relative-error metrics

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
RFID Data Aggregation

GSN '09 Proceedings of the 3rd International Conference on GeoSensor Networks
Enabling OLAP in mobile environments via intelligent data cube compression techniques

Journal of Intelligent Information Systems
Building data synopses within a known maximum error bound

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
A top-down approach for compressing data cubes under the simultaneous evaluation of multiple hierarchical range queries

Journal of Intelligent Information Systems
An efficient algorithm for computing range-groupby queries

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
A probabilistic framework for estimating the accuracy of aggregate range queries evaluated over histograms

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fast estimates for aggregate queries are useful in database query optimization, approximate query answering and online query processing. Hence, there has been a lot of focus on “selectivity estimation”, that is, computing summary statistics on the underlying data and using that to answer aggregate queries fast and to a reasonable approximation. We present two sets of results for range aggregate queries, which are amongst the most common queries.First, we focus on a histogram as summary statistics and present algorithms for constructing histograms that are provably optimal (or provably approximate) for range queries; these algorithms take (pseudo-) polynomial time. These are the first known optimality or approximation results for arbitrary range queries; previously known results were optimal only for restricted range queries (such as equality queries, hierarchical or prefix range queries).Second, we focus on wavelet-based representations as summary statistics and present fast algorithms for picking wavelet statistics that are provably optimal for range queries. No previously-known wavelet-based methods have this property.We perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods.