Self-tuning histograms: building histograms without looking at data

Authors:
Ashraf Aboulnaga;Surajit Chaudhuri
Affiliations:
Computer Sciences Department, University of Wisconsin, Madison;Microsoft Research
Venue:
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Year:
1999

Citing 11
Cited 92

Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Adaptive selectivity estimation using query feedback

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Efficient mid-query re-optimization of sub-optimal query execution plans

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate medians and other quantiles in one pass and with limited memory

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The optimization of queries in relational databases

The optimization of queries in relational databases

Optimal histograms for hierarchical range queries (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fast, small-space algorithms for approximate histogram maintenance

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Exploiting statistics on query expressions for optimization

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Compressing SQL workloads

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Fast incremental maintenance of approximate histograms

ACM Transactions on Database Systems (TODS)
RHist: adaptive summarization over continuous data streams

Proceedings of the eleventh international conference on Information and knowledge management
CPU and incremental memory allocation in dynamic parallelization of SQL Queries

Parallel Computing
Rangesum histograms

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Supporting Efficient Parametric Search of E-Commerce Data: A Loosely-Coupled Solution

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes

Proceedings of the 27th International Conference on Very Large Data Bases
Managing and analyzing massive data sets with data cubes

Handbook of massive data sets
3D visual data mining: goals and experiences

Computational Statistics & Data Analysis - Data visualization
Quality of service in an information economy

ACM Transactions on Internet Technology (TOIT)
A multi-dimensional histogram for selectivity estimation and fast approximate query answering

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
Querying about the Past, the Present, and the Future in Spatio-Temporal Databases

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Selectivity Estimation for String Predicates: Overcoming the Underestimation Problem

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Conditional selectivity for statistics on query expressions

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Toward a progress indicator for database queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Estimating progress of execution for SQL queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Use and Maintenance of Histograms for Large Scientific Database Access Planning: A Case Study of a Pharmaceutical Data Repository

Journal of Intelligent Information Systems
Structure choices for two-dimensional histogram construction

CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Approximation algorithms for array partitioning problems

Journal of Algorithms
Selectivity estimators for multidimensional range queries over real attributes

The VLDB Journal — The International Journal on Very Large Data Bases
Consistently estimating the selectivity of conjuncts of predicates

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Flexible database generators

VLDB '05 Proceedings of the 31st international conference on Very large data bases
CXHist: an on-line classification-based histogram for XML string selectivity estimation

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Self-tuning cost modeling of user-defined functions in an object-relational DBMS

ACM Transactions on Database Systems (TODS)
Graph-based synopses for relational selectivity estimation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficient detection of empty-result queries

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Adaptive density estimation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Consistent selectivity estimation via maximum entropy

The VLDB Journal — The International Journal on Very Large Data Bases
A study on workload-aware wavelet synopses for point and range-sum queries

DOLAP '06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
Query result ranking over e-commerce web databases

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Estimating query result sizes for proxy caching in scientific database federations

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Compressed histograms with arbitrary bucket layouts for selectivity estimation

Information Sciences: an International Journal
Optimal workload-based weighted wavelet synopses

Theoretical Computer Science
Approximate range---sum query answering on data cubes with probabilistic guarantees

Journal of Intelligent Information Systems
Selectivity estimation by batch-query based histogram and parametric method

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
SASH: a self-adaptive histogram set for dynamically changing workloads

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Primitives for workload summarization and implications for SQL

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automated statistics collection in DB2 UDB

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Self-tuning database systems: a decade of progress

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Peer-to-peer similarity search in metric spaces

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Workload-based generation of administrator hints for optimizing database storage utilization

ACM Transactions on Storage (TOS)
Adaptive query processing

Foundations and Trends in Databases
Proactive and reactive multi-dimensional histogram maintenance for selectivity estimation

Journal of Systems and Software
Robustness in automatic physical database design

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Hierarchical synopses with optimal error guarantees

ACM Transactions on Database Systems (TODS)
Workload-Aware Histograms for Remote Applications

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Identifying robust plans through plan diagram reduction

Proceedings of the VLDB Endowment
A pay-as-you-go framework for query execution feedback

Proceedings of the VLDB Endowment
A new approach to building histogram for selectivity estimation in query processing optimization

Computers & Mathematics with Applications
TuG synopses for approximate query answering

ACM Transactions on Database Systems (TODS)
Multiplicative synopses for relative-error metrics

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Query optimizers: time to rethink the contract?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Maintenance strategies for routing indexes

Distributed and Parallel Databases
Statistical structures for Internet-scale data management

The VLDB Journal — The International Journal on Very Large Data Bases
Optimality and scalability in lattice histogram construction

Proceedings of the VLDB Endowment
Consistent histograms in the presence of distinct value counts

Proceedings of the VLDB Endowment
Warm cache costing: a feedback optimization technique for buffer pool aware costing

Proceedings of the 13th International Conference on Extending Database Technology
Getting qualified answers for aggregate queries in spatio-temporal databases

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
A top-down approach for compressing data cubes under the simultaneous evaluation of multiple hierarchical range queries

Journal of Intelligent Information Systems
A statistics propagation approach to enable cost-based optimization of statement sequences

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
A secure multi-dimensional partition based index in DAS

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Towards elastic transactional cloud storage with range query support

Proceedings of the VLDB Endowment
Instant anonymization

ACM Transactions on Database Systems (TODS)
Efficient selectivity estimation by histogram construction based on subspace clustering

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Self-adaptive statistics management for efficient query processing

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Workload-optimal histograms on streams

ESA'05 Proceedings of the 13th annual European conference on Algorithms
HASE: a hybrid approach to selectivity estimation for conjunctive predicates

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Optimal workload-based weighted wavelet synopses

ICDT'05 Proceedings of the 10th international conference on Database Theory
Subquadratic algorithms for workload-aware haar wavelet synopses

FSTTCS '05 Proceedings of the 25th international conference on Foundations of Software Technology and Theoretical Computer Science
Spatio-temporal histograms

SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Improving the accuracy of histograms for geographic data objects

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Robust estimation of resource consumption for SQL queries using statistical techniques

Proceedings of the VLDB Endowment
Histograms as statistical estimators for aggregate queries

Information Systems
Streaming algorithms for data in motion

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Efficiently adapting graphical models for selectivity estimation

The VLDB Journal — The International Journal on Very Large Data Bases
STHist-C: a highly accurate cluster-based histogram for two and three dimensional geographic data points

Geoinformatica
Bichromatic buckets: An effective technique to improve the accuracy of histograms for geographic data points

Data & Knowledge Engineering
Exploring optimization and caching for efficient collection operations

Automated Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce self-tuning histograms. Although similar in structure to traditional histograms, these histograms infer data distributions not by examining the data or a sample thereof, but by using feedback from the query execution engine about the actual selectivity of range selection operators to progressively refine the histogram. Since the cost of building and maintaining self-tuning histograms is independent of the data size, self-tuning histograms provide a remarkably inexpensive way to construct histograms for large data sets with little up-front costs. Self-tuning histograms are particularly attractive as an alternative to multi-dimensional traditional histograms that capture dependencies between attributes but are prohibitively expensive to build and maintain. In this paper, we describe the techniques for initializing and refining self-tuning histograms. Our experimental results show that self-tuning histograms provide a low-cost alternative to traditional multi-dimensional histograms with little loss of accuracy for data distributions with low to moderate skew.