Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Processing aggregate relational queries with hard time constraints
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
Error-constrained COUNT query evaluation in relational databases
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Size-estimation framework with applications to transitive closure and reachability
Journal of Computer and System Sciences
Towards estimation error guarantees for distinct values
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
New TPC benchmarks for decision support and web commerce
ACM SIGMOD Record
Modern Information Retrieval
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
MUDD: a multi-dimensional data generator
WOSP '04 Proceedings of the 4th international workshop on Software and performance
A dip in the reservoir: maintaining sample synopses of evolving datasets
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On synopses for distinct-value estimation under multiset operations
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Maintaining bernoulli samples over evolving multisets
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Automated statistics collection in DB2 UDB
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Brighthouse: an analytic data warehouse for ad-hoc queries
Proceedings of the VLDB Endowment
Optimizer plan change management: improved stability and performance in Oracle 11g
Proceedings of the VLDB Endowment
Closing the query processing loop in Oracle 11g
Proceedings of the VLDB Endowment
An efficient features-based processing technique for supergraph queries
Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Implementing vertical splitting for large scale multidimensional datasets and its evaluations
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Large tables are often decomposed into smaller pieces called partitions in order to improve query performance and ease the data management. Query optimizers rely on both the statistics of the entire table and the statistics of the individual partitions to select a good execution plan for a SQL statement. In Oracle 10g, we scan the entire table twice, one pass for gathering the table level statistics and the other pass for gathering the partition level statistics. A consequence of this gathering method is that, when the data in some partitions change, not only do we need to scan the changed partitions to gather the partition level statistics, but also we have to scan the entire table again to gather the table level statistics. Oracle 11g adopts a one-pass distinct sampling based method which can accurately derive the table level statistics from the partition level statistics. When data change, Oracle only re-gathers the statistics for the changed partitions and then derives the table level statistics without touching the unchanged partitions. To the best of our knowledge, although the one-pass distinct sampling has been researched in academia for some years, Oracle is the first commercial database that implements the technique. We have performed extensive experiments on both benchmark data and real customer data. Our experiments illustrate the this new method is highly accurate and has significantly better performance than the old method used in Oracle 10g.