Automated statistics collection in DB2 UDB

Authors:
A. Aboulnaga;P. Haas;M. Kandil;S. Lightstone;G. Lohman;V. Markl;I. Popivanov;V. Raman
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Toronto Development Lab, Markham, ON, Canada;IBM Toronto Development Lab, Markham, ON, Canada;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Toronto Development Lab, Markham, ON, Canada;IBM Almaden Research Center, San Jose, CA
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 11
Cited 20

Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Exploiting statistics on query expressions for optimization

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Toward autonomic computing with DB2 universal database

ACM SIGMOD Record
Automating Statistics Management for Query Optimizers

IEEE Transactions on Knowledge and Data Engineering
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
A piggyback method to collect statistics for query optimization in database management systems

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Oracle Database 10g New Features: Oracle10g Reference for Advanced Tuning and Administration

Oracle Database 10g New Features: Oracle10g Reference for Advanced Tuning and Administration
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
SASH: a self-adaptive histogram set for dynamically changing workloads

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Automated statistics collection in action

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Statistical learning techniques for costing XML queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Consistently estimating the selectivity of conjuncts of predicates

VLDB '05 Proceedings of the 31st international conference on Very large data bases
GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
POP/FED: progressive query optimization for federated queries in DB2

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Consistent selectivity estimation via maximum entropy

The VLDB Journal — The International Journal on Very Large Data Bases
Towards workload shift detection and prediction for autonomic databases

Proceedings of the ACM first Ph.D. workshop in CIKM
Self-tuning database systems: a decade of progress

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Detecting attribute dependencies from query feedback

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Robustness in automatic physical database design

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Distributed database statistics collection using mobile agents

TELE-INFO'07 Proceedings of the 6th WSEAS Int. Conference on Telecommunications and Informatics
Efficient and scalable statistics gathering for large databases in Oracle 11g

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A pay-as-you-go framework for query execution feedback

Proceedings of the VLDB Endowment
Maintenance strategies for routing indexes

Distributed and Parallel Databases
StatAdvisor: recommending statistical views

Proceedings of the VLDB Endowment
Exact cardinality query optimization for optimizer testing

Proceedings of the VLDB Endowment
Online monitoring and visualisation of database structural deterioration

International Journal of Autonomic Computing
Self-adaptive statistics management for efficient query processing

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Automated physical designers: what you see is (not) what you get

DBTest '12 Proceedings of the Fifth International Workshop on Testing Database Systems
Statistics collection in oracle spatial and graph: fast histogram construction for complex geometry objects

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of inaccurate or outdated database statistics by the query optimizer in a relational DBMS often results in a poor choice of query execution plans and hence unacceptably long query processing times. Configuration and maintenance of these statistics has traditionally been a time-consuming manual operation, requiring that the database administrator (DBA) continually monitor query performance and data changes in order to determine when to refresh the statistics values and when and how to adjust the set of statistics that the DBMS maintains. In this paper we describe the new Automated Statistics Collection (ASC) component of IBM® DB2® Universal DatabaseTM (DB2 UDB). This autonomic technology frees the DBA from the tedious task of manually supervising the collection and maintenance of database statistics. ASC monitors both the update-delete-insert (UDI) activities on the data as well as query feedback (QF), i.e., the results of the queries that are executed on the data. ASC uses these two sources of information to automatically decide which statistics to collect and when to collect them. This combination of UDI-driven and QF-driven autonomic processes ensures that the system can handle unforeseen queries while also ensuring good performance for frequent and important queries. We present the basic concepts, architecture, and key implementation details of ASC in DB2 UDB, and present a case study showing how the use of ASC can speed up a query workload by orders of magnitude without requiring any DBA intervention.