Automated statistics collection in action

Authors:
P. Haas;M. Kandil;A. Lerner;V. Markl;I. Popivanov;V. Raman;D. Zilio
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM Toronto Development Laboratory, ON, Canada;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Toronto Development Laboratory, Markham, ON, Canada;IBM Almaden Research Center, San Jose, CA;IBM Toronto Development Laboratory, Markham, ON, Canada
Venue:
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Year:
2005

Citing 2
Cited 0

LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Automated statistics collection in DB2 UDB

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Quantified Score

Hi-index	0.00

Visualization

Abstract

If presented with inaccurate statistics, even the most sophisticated query optimizers make mistakes. They may wrongly estimate the output cardinality of a certain operation and thus make sub-optimal plan choices based on that cardinality. Maintaining accurate statistics is hard, both because each table may need a specifically parameterized set of statistics and because statistics get outdated as the database changes. Automated Statistic Collection (ASC) is a new component in IBM DB2 UDB that, without any DBA intervention, observes and analyzes the effects of faulty statistics and, in response, it triggers actions that continuously repair the latter. In this demonstration, we will show how ASC works to alleviate the DBA from the task of maintaining fresh, accurate statistics in several challenging scenarios. ASC is able to reconfigure the statistics collection parameters (e.g, number of frequent values for a column, or correlations between certain column pairs) on a per-table basis. ASC can also detect and guard against outdated statistics caused by high updates/inserts/deletes rates in volatile, dynamic databases. We will also show how ASC works from the inside: from how cardinality mis-estimations are introduced in different kind of operators, to how this error is propagated to later operations in the plan, to how this influences plan choices inside the optimizer.