Efficient and scalable statistics gathering for large databases in Oracle 11g

  • Authors:
  • Sunil Chakkappen;Thierry Cruanes;Benoit Dageville;Linan Jiang;Uri Shaft;Hong Su;Mohamed Zait

  • Affiliations:
  • Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA;Oracle, Redwood Shores, CA, USA

  • Venue:
  • Proceedings of the 2008 ACM SIGMOD international conference on Management of data
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large tables are often decomposed into smaller pieces called partitions in order to improve query performance and ease the data management. Query optimizers rely on both the statistics of the entire table and the statistics of the individual partitions to select a good execution plan for a SQL statement. In Oracle 10g, we scan the entire table twice, one pass for gathering the table level statistics and the other pass for gathering the partition level statistics. A consequence of this gathering method is that, when the data in some partitions change, not only do we need to scan the changed partitions to gather the partition level statistics, but also we have to scan the entire table again to gather the table level statistics. Oracle 11g adopts a one-pass distinct sampling based method which can accurately derive the table level statistics from the partition level statistics. When data change, Oracle only re-gathers the statistics for the changed partitions and then derives the table level statistics without touching the unchanged partitions. To the best of our knowledge, although the one-pass distinct sampling has been researched in academia for some years, Oracle is the first commercial database that implements the technique. We have performed extensive experiments on both benchmark data and real customer data. Our experiments illustrate the this new method is highly accurate and has significantly better performance than the old method used in Oracle 10g.