Fast incremental maintenance of approximate histograms

Authors:
Phillip B. Gibbons;Yossi Matias;Viswanath Poosala
Affiliations:
Intel Research Pittsburgh, Pittsburgh, PA;Tel Aviv University, Tel Aviv, Israel;Bell Laboratories, Murray Hill, NJ
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2002

Citing 32
Cited 39

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
A logarithmic time sort for linear size networks

Journal of the ACM (JACM)
Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On the propagation of errors in the size of join results

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems

Histogram-based estimation techniques in database systems
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A comparison of selectivity estimators for range queries on metric attributes

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Implications of certain assumptions in database performance evauation

ACM Transactions on Database Systems (TODS)
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A robust, optimization-based approach for approximate answering of aggregate queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Fast, small-space algorithms for approximate histogram maintenance

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The optimization of queries in relational databases

The optimization of queries in relational databases
How to summarize the universe: dynamic maintenance of quantiles

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Querying about the Past, the Present, and the Future in Spatio-Temporal Databases

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Online maintenance of very large random samples

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adapting to source properties in processing data integration queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
IMAX: Incremental Maintenance of Schema-Based XML Statistics

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Proactive re-optimization

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Discretization from data streams: applications to histograms and data mining

Proceedings of the 2006 ACM symposium on Applied computing
Approximate quantiles and the order of the stream

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Spatio-temporal data reduction with deterministic error bounds

The VLDB Journal — The International Journal on Very Large Data Bases
A dip in the reservoir: maintaining sample synopses of evolving datasets

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient peer-to-peer semantic overlay networks based on statistical language models

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Random Sampling for Continuous Streams with Arbitrary Updates

IEEE Transactions on Knowledge and Data Engineering
Compressed histograms with arbitrary bucket layouts for selectivity estimation

Information Sciences: an International Journal
Hierarchical synopses with optimal error guarantees

ACM Transactions on Database Systems (TODS)
Maintaining very large random samples using the geometric file

The VLDB Journal — The International Journal on Very Large Data Bases
Workload-Aware Histograms for Remote Applications

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A framework for estimating complex probability density structures in data streams

Proceedings of the 17th ACM conference on Information and knowledge management
The average-case complexity of counting distinct elements

Proceedings of the 12th International Conference on Database Theory
Multiplicative synopses for relative-error metrics

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Self-tuning management of update-intensive multidimensional data in clusters of workstations

The VLDB Journal — The International Journal on Very Large Data Bases
Enabling OLAP in mobile environments via intelligent data cube compression techniques

Journal of Intelligent Information Systems
Fast and effective histogram construction

Proceedings of the 18th ACM conference on Information and knowledge management
Dimension table driven approach to referential partition relational data warehouses

Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP
Statistical structures for Internet-scale data management

The VLDB Journal — The International Journal on Very Large Data Bases
Optimality and scalability in lattice histogram construction

Proceedings of the VLDB Endowment
A statistics propagation approach to enable cost-based optimization of statement sequences

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Hierarchically organized skew-tolerant histograms for geographic data objects

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Towards approximate SQL: infobright's approach

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Efficient construction of histograms for multidimensional data using quad-trees

Decision Support Systems
The shifting sands algorithm

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
A probabilistic framework for estimating the accuracy of aggregate range queries evaluated over histograms

Information Sciences: an International Journal
Efficient approximate visibility query in large dynamic environments

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
HEDC: a histogram estimator for data in the cloud

Proceedings of the fourth international workshop on Cloud data management
Sort-based parallel loading of R-trees

Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data
STHist-C: a highly accurate cluster-based histogram for two and three dimensional geographic data points

Geoinformatica
Scalable and dynamically balanced shared-everything OLTP with physiological partitioning

The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive stratified reservoir sampling over heterogeneous data streams

Information Systems
Bichromatic buckets: An effective technique to improve the accuracy of histograms for geographic data points

Data & Knowledge Engineering
Exploring optimization and caching for efficient collection operations

Automated Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many commercial database systems maintain histograms to summarize the contents of large relations and permit efficient estimation of query result sizes for use in query optimizers. Delaying the propagation of database updates to the histogram often introduces errors into the estimation. This article presents new sampling-based approaches for incremental maintenance of approximate histograms. By scheduling updates to the histogram based on the updates to the database, our techniques are the first to maintain histograms effectively up to date at all times and avoid computing overheads when unnecessary. Our techniques provide highly accurate approximate histograms belonging to the equidepth and Compressed classes. Experimental results show that our new approaches provide orders of magnitude more accurate estimation than previous approaches.An important aspect employed by these new approaches is a backing sample, an up-to-date random sample of the tuples currently in a relation. We provide efficient solutions for maintaining a uniformly random sample of a relation in the presence of updates to the relation. The backing sample techniques can be used for any other application that relies on random samples of data.