Balancing histogram optimality and practicality for query result size estimation

Authors:
Yannis E. Ioannidis;Viswanath Poosala
Affiliations:
Computer Sciences Department, University of Wisconsin, Madison, WI;Computer Sciences Department, University of Wisconsin, Madison, WI
Venue:
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Year:
1995

Citing 19
Cited 120

Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Statistical profile estimation in database systems

ACM Computing Surveys (CSUR)
Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On the propagation of errors in the size of join results

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Experience from a real life query optimizer

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Optimal histograms for limiting worst-case error propagation in the size of join results

ACM Transactions on Database Systems (TODS)
Adaptive selectivity estimation using query feedback

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Implications of certain assumptions in database performance evauation

ACM Transactions on Database Systems (TODS)
A model of data distribution based on texture analysis

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
A detailed statistical model for relational query optimization

ACM '85 Proceedings of the 1985 ACM annual conference on The range of computing : mid-80's perspective: mid-80's perspective
A Guide to DB2

A Guide to DB2
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Estimating block transfers and join sizes

SIGMOD '83 Proceedings of the 1983 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
On B-Tree Indices for Skewed Distributions

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Universality of Serial Histograms

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
The optimization of queries in relational databases

The optimization of queries in relational databases

Estimating alphanumeric selectivity in the presence of wildcards

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Query optimization

ACM Computing Surveys (CSUR)
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Efficient mid-query re-optimization of sub-optimal query execution plans

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization techniques for queries with expensive methods

ACM Transactions on Database Systems (TODS)
Iterated DFT based techniques for join size estimation

Proceedings of the seventh international conference on Information and knowledge management
Tracking join and self-join sizes in limited storage

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Substring selectivity estimation

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Multi-dimensional selectivity estimation using compressed histogram information

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On approximating rectangle tiling and packing

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Optimal histograms for hierarchical range queries (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space efficient bitmap indexing

Proceedings of the ninth international conference on Information and knowledge management
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Global optimization of histograms

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Applying the golden rule of sampling for query estimation

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Multiway spatial joins

ACM Transactions on Database Systems (TODS)
Fast algorithms for hierarchical range histogram construction

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Dynamic multidimensional histograms

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Querying Compressed Data in Data Warehouses

Information Technology and Management
A Hybrid Estimator for Selectivity Estimation

IEEE Transactions on Knowledge and Data Engineering
Automating Statistics Management for Query Optimizers

IEEE Transactions on Knowledge and Data Engineering
Using histograms to estimate answer sizes for XML queries

Information Systems - Special issue: Best papers from EDBT 2002
Estimating Answer Sizes for XML Queries

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Selectivity Estimation in Extensible Databases - A Neural Network Approach

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Multi-Dimensional Substring Selectivity Estimation

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes

Proceedings of the 27th International Conference on Very Large Data Bases
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Modeling Skewed Distribution Using Multifractals and the `80-20' Law

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Recovering Information from Summary Data

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
On Linear-Spline Based Histograms

WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Compressed Datacubes for fast OLAP Applications

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Limiting Result Cardinalities for Multidatabase Queries Using Histograms

BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
Summary Grids: Building Accurate Multidimensional Histograms

DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
Binary-Tree Histograms with Tree Indices

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Mining Deviants in a Time Series Database

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
One-dimensional and multi-dimensional substring selectivity estimation

The VLDB Journal — The International Journal on Very Large Data Bases
Approximate query processing using wavelets

The VLDB Journal — The International Journal on Very Large Data Bases
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Utilizing histogram information

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Query Result Size Estimation Using a Novel Histogram-like Technique: The Rectangular Attribute Cardinality Map

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
A multi-dimensional histogram for selectivity estimation and fast approximate query answering

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
A new histogram method for sparse attributes: the averaged rectangular attribute cardinality map

ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Query Size Estimation for Joins Using Systematic Sampling

Distributed and Parallel Databases
A Selectivity Model for Fragmented Relations: Applied in Information Retrieval

IEEE Transactions on Knowledge and Data Engineering
Selectivity Estimation for String Predicates: Overcoming the Underestimation Problem

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Estimating progress of execution for SQL queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Use and Maintenance of Histograms for Large Scientific Database Access Planning: A Case Study of a Pharmaceutical Data Repository

Journal of Intelligent Information Systems
Energy efficient exact kNN search in wireless broadcast environments

Proceedings of the 12th annual ACM international workshop on Geographic information systems
Structure choices for two-dimensional histogram construction

CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Projective Clustering by Histograms

IEEE Transactions on Knowledge and Data Engineering
Synopses for query optimization: a space-complexity perspective

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximation algorithms for array partitioning problems

Journal of Algorithms
What's hot and what's not: tracking most frequent items dynamically

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Histograms revisited: when are histograms the best approximation method for aggregates over joins?

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
When can we trust progress estimators for SQL queries?

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Hierarchical binary histograms for summarizing multi-dimensional data

Proceedings of the 2005 ACM symposium on Applied computing
Space efficiency in synopsis construction algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Hubble: an advanced dynamic folder technology for XML

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Using Datacube Aggregates for Approximate Querying and Deviation Detection

IEEE Transactions on Knowledge and Data Engineering
Synopses for query optimization: A space-complexity perspective

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Approximation and streaming algorithms for histogram construction problems

ACM Transactions on Database Systems (TODS)
Answering queries using materialized views with minimum size

The VLDB Journal — The International Journal on Very Large Data Bases
Holes in joins

Journal of Intelligent Information Systems
Compact histograms for hierarchical identifiers

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
Error minimization in approximate range aggregates

Data & Knowledge Engineering
A Note on Linear Time Algorithms for Maximum Error Histograms

IEEE Transactions on Knowledge and Data Engineering
Selectivity estimation by batch-query based histogram and parametric method

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
A time machine for text search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive index structures

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Searching on the secondary structure of protein sequences

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
REHIST: relative error histogram construction algorithms

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Rk-hist: an r-tree based histogram for multi-dimensional selectivity estimation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Histograms based on the minimum description length principle

The VLDB Journal — The International Journal on Very Large Data Bases
Analytic-based estimation of query result sizes

AIKED'05 Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases
Accurate histogram-based XML summarization

Proceedings of the 2008 ACM symposium on Applied computing
DAWN: an efficient framework of DCT for data with error estimation

The VLDB Journal — The International Journal on Very Large Data Bases
Hierarchical synopses with optimal error guarantees

ACM Transactions on Database Systems (TODS)
Enhancing histograms by tree-like bucket indices

The VLDB Journal — The International Journal on Very Large Data Bases
Compressed hierarchical binary histograms for summarizing multi-dimensional data

Knowledge and Information Systems
On the space---time of optimal, approximate and streaming algorithms for synopsis construction problems

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient top-k processing over query-dependent functions

Proceedings of the VLDB Endowment
The design of a query monitoring system

ACM Transactions on Database Systems (TODS)
Optimal splitters for database partitioning with size bounds

Proceedings of the 12th International Conference on Database Theory
AMID: Approximation of MultI-measured Data using SVD

Information Sciences: an International Journal
Multi-dimensional data density estimation in P2P networks

Distributed and Parallel Databases
Fast and effective histogram construction

Proceedings of the 18th ACM conference on Information and knowledge management
Statistical structures for Internet-scale data management

The VLDB Journal — The International Journal on Very Large Data Bases
Optimality and scalability in lattice histogram construction

Proceedings of the VLDB Endowment
Consistent histograms in the presence of distinct value counts

Proceedings of the VLDB Endowment
Splash: ad-hoc querying of data and statistical models

Proceedings of the 13th International Conference on Extending Database Technology
A statistics propagation approach to enable cost-based optimization of statement sequences

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
New methods for deviation-based outlier detection in large database

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Histograms reloaded: the merits of bucket diversity

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Deriving predicate statistics in datalog

Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming
A quad-tree based multiresolution approach for two-dimensional summary data

Information Systems
The VC-dimension of SQL queries and selectivity estimation through sampling

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Workload-optimal histograms on streams

ESA'05 Proceedings of the 13th annual European conference on Algorithms
A probabilistic framework for estimating the accuracy of aggregate range queries evaluated over histograms

Information Sciences: an International Journal
Clustering-based histograms for multi-dimensional data

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Estimating the overlapping area of polygon join

SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Estimating aggregate join queries over data streams using discrete cosine transform

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Processing count queries over event streams at multiple time granularities

Information Sciences: an International Journal
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Histograms as statistical estimators for aggregate queries

Information Systems
Deriving predicate statistics for logic rules

RR'12 Proceedings of the 6th international conference on Web Reasoning and Rule Systems
Non-termination analysis and cost-based query optimization of logic programs

RR'12 Proceedings of the 6th international conference on Web Reasoning and Rule Systems
Efficiently adapting graphical models for selectivity estimation

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient and scalable monitoring and summarization of large probabilistic data

Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Entropy-based histograms for selectivity estimation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many current database systems use histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. In choosing among the various histograms, one has to balance between two conflicting goals: optimality, so that generated estimates have the least error, and practicality, so that histograms can be constructed and maintained efficiently. In this paper, we present both theoretical and experimental results on several issues related to this trade-off. Our overall conclusion is that the most effective approach is to focus on the class of histograms that accurately maintain the frequencies of a few attribute values and assume the uniform distribution for the rest, and choose for each relation the histogram in that class that is optimal for a self-join query.