Efficient and scalable statistics gathering for large databases in Oracle 11g

Authors:
Sunil Chakkappen;Thierry Cruanes;Benoit Dageville;Linan Jiang;Uri Shaft;Hong Su;Mohamed Zait
Affiliations:
Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 18
Cited 6

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Processing aggregate relational queries with hard time constraints

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
A linear-time probabilistic counting algorithm for database applications

ACM Transactions on Database Systems (TODS)
Error-constrained COUNT query evaluation in relational databases

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Size-estimation framework with applications to transitive closure and reachability

Journal of Computer and System Sciences
Towards estimation error guarantees for distinct values

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
New TPC benchmarks for decision support and web commerce

ACM SIGMOD Record
Modern Information Retrieval

Modern Information Retrieval
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Temporal Data Management

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
MUDD: a multi-dimensional data generator

WOSP '04 Proceedings of the 4th international workshop on Software and performance
A dip in the reservoir: maintaining sample synopses of evolving datasets

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The making of TPC-DS

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On synopses for distinct-value estimation under multiset operations

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Maintaining bernoulli samples over evolving multisets

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Automated statistics collection in DB2 UDB

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Brighthouse: an analytic data warehouse for ad-hoc queries

Proceedings of the VLDB Endowment
Optimizer plan change management: improved stability and performance in Oracle 11g

Proceedings of the VLDB Endowment
Closing the query processing loop in Oracle 11g

Proceedings of the VLDB Endowment
An efficient features-based processing technique for supergraph queries

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Implementing vertical splitting for large scale multidimensional datasets and its evaluations

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Statistics collection in oracle spatial and graph: fast histogram construction for complex geometry objects

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large tables are often decomposed into smaller pieces called partitions in order to improve query performance and ease the data management. Query optimizers rely on both the statistics of the entire table and the statistics of the individual partitions to select a good execution plan for a SQL statement. In Oracle 10g, we scan the entire table twice, one pass for gathering the table level statistics and the other pass for gathering the partition level statistics. A consequence of this gathering method is that, when the data in some partitions change, not only do we need to scan the changed partitions to gather the partition level statistics, but also we have to scan the entire table again to gather the table level statistics. Oracle 11g adopts a one-pass distinct sampling based method which can accurately derive the table level statistics from the partition level statistics. When data change, Oracle only re-gathers the statistics for the changed partitions and then derives the table level statistics without touching the unchanged partitions. To the best of our knowledge, although the one-pass distinct sampling has been researched in academia for some years, Oracle is the first commercial database that implements the technique. We have performed extensive experiments on both benchmark data and real customer data. Our experiments illustrate the this new method is highly accurate and has significantly better performance than the old method used in Oracle 10g.