Exploiting statistics on query expressions for optimization

Authors:
Nicolas Bruno;Surajit Chaudhuri
Affiliations:
Columbia University;Microsoft Research
Venue:
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Year:
2002

Citing 20
Cited 55

The EXODUS optimizer generator

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Extensible query processing in starburst

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of query optimization in relational systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
AutoAdmin “what-if” index analysis utility

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Optimizing queries using materialized views: a practical, scalable solution

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Optimizing Queries with Materialized Views

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
A Scalable Algorithm for Answering Queries Using Views

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The Volcano Optimizer Generator: Extensibility and Efficient Search

Proceedings of the Ninth International Conference on Data Engineering
Automating Statistics Management for Query Optimizers

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

A multi-dimensional histogram for selectivity estimation and fast approximate query answering

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
Conditional selectivity for statistics on query expressions

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adapting to source properties in processing data integration queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adaptive ordering of pipelined stream filters

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Robust query processing through progressive optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Self-monitoring query execution for adaptive query processing

Data & Knowledge Engineering
Adaptive Caching for Continuous Queries

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Synopses for query optimization: a space-complexity perspective

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient computation of multiple group by queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Consistently estimating the selectivity of conjuncts of predicates

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Content-based routing: different plans for different data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Using Datacube Aggregates for Approximate Querying and Deviation Detection

IEEE Transactions on Knowledge and Data Engineering
Synopses for query optimization: A space-complexity perspective

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Fast approximate computation of statistics on views

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient detection of empty-result queries

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Consistent selectivity estimation via maximum entropy

The VLDB Journal — The International Journal on Very Large Data Bases
Estimating query result sizes for proxy caching in scientific database federations

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Adaptive rank-aware query optimization in relational databases

ACM Transactions on Database Systems (TODS)
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
BHUNT: automatic discovery of Fuzzy algebraic constraints in relational data

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Statistics on views

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automated statistics collection in DB2 UDB

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Self-tuning database systems: a decade of progress

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Detecting attribute dependencies from query feedback

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Adaptive query processing

Foundations and Trends in Databases
Robustness in automatic physical database design

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Histograms based on the minimum description length principle

The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive optimization of join trees for multi-join queries over sensor streams

Information Fusion
Generating targeted queries for database testing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Mining Conditional Cardinality Patterns for Data Warehouse Query Optimization

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A pay-as-you-go framework for query execution feedback

Proceedings of the VLDB Endowment
Relational support for flexible schema scenarios

Proceedings of the VLDB Endowment
The design of a query monitoring system

ACM Transactions on Database Systems (TODS)
Using intrinsic data skew to improve hash join performance

Information Systems
ROX: run-time optimization of XQueries

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Filtered statistics

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Query optimizers: time to rethink the contract?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Cardinality estimation in ETL processes

Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP
StatAdvisor: recommending statistical views

Proceedings of the VLDB Endowment
Consistent histograms in the presence of distinct value counts

Proceedings of the VLDB Endowment
Exact cardinality query optimization for optimizer testing

Proceedings of the VLDB Endowment
A formal framework for database sampling

Information and Software Technology
A statistics propagation approach to enable cost-based optimization of statement sequences

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
SQL query space and time complexity estimation for multidimensional queries

International Journal of Intelligent Information and Database Systems
Deriving predicate statistics in datalog

Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming
How to juggle columns: an entropy-based approach for table compression

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Self-adaptive statistics management for efficient query processing

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Re-optimizing data-parallel computing

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Deriving predicate statistics for logic rules

RR'12 Proceedings of the 6th international conference on Web Reasoning and Rule Systems
Exploiting data access for dynamic fragmentation in data warehouse

International Journal of Intelligent Information and Database Systems
Pragmatic correlation analysis for probabilistic ranking over relational data

Expert Systems with Applications: An International Journal
Mining and indexing graphs for supergraph search

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistics play an important role in influencing the plans produced by a query optimizer. Traditionally, optimizers use statistics built over base tables and assume independence between attributes while propagating statistical information through the query plan. This approach can introduce large estimation errors, which may result in the optimizer choosing inefficient execution plans. In this paper, we show how to extend a generic optimizer so that it also exploits statistics built on expressions corresponding to intermediate nodes of query plans. We show that in some cases, the quality of the resulting plans is significantly better than when only base-table statistics are available. Unfortunately, even moderately-sized schemas may have too many relevant candidate statistics. We introduce a workload-driven technique to identify a small subset of statistics that can provide significant benefits over just maintaining base-table statistics. Finally, we present experimental results on an implementation of our approach in Microsoft SQL Server 2000.