R* optimizer validation and performance evaluation for local queries

Authors:
Lothar F. Mackert;Guy M. Lohman
Affiliations:
Univensity of Erlangen-Nürnberg IMMD-IV Martensstr 3, D-8520 Erlangen, West Germany;IBM Almaden Research Center, San Jose, CA
Venue:
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Year:
1986

Citing 18
Cited 46

Principles of database buffer management

ACM Transactions on Database Systems (TODS)
Estimating the cost of updates in a relational database

ACM Transactions on Database Systems (TODS)
Support for repetitive transactions and ad hoc queries in System R

ACM Transactions on Database Systems (TODS)
Query processing in a system for distributed databases (SDD-1)

ACM Transactions on Database Systems (TODS)
Query optimization in star computer networks

ACM Transactions on Database Systems (TODS)
Optimization of query evaluation algorithms

ACM Transactions on Database Systems (TODS)
Retrospection on a database system

ACM Transactions on Database Systems (TODS)
Decomposition—a strategy for query processing

ACM Transactions on Database Systems (TODS)
Differential files: their application to the maintenance of large databases

ACM Transactions on Database Systems (TODS)
Operating system support for database management

Communications of the ACM
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Distributed query processing in a relational data base system

SIGMOD '78 Proceedings of the 1978 ACM SIGMOD international conference on management of data
On the design of a query processing strategy in a distributed database environment

SIGMOD '83 Proceedings of the 1983 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Hashing Methods and Relational Algebra Operations

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
Optimization of Nested Queries in a Distributed Relational Database

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
A Heuristic Approach to Distributed Query Processing

VLDB '82 Proceedings of the 8th International Conference on Very Large Data Bases
A Mechanism for Managing the Buffer Pool in a Relational Database System Using the Hot Set Model

VLDB '82 Proceedings of the 8th International Conference on Very Large Data Bases

Query optimization by simulated annealing

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Physical database design for relational databases

ACM Transactions on Database Systems (TODS)
Statistical profile estimation in database systems

ACM Computing Surveys (CSUR)
A case study for distributed query processing

DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Heuristic algorithms for distributed query processing

DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Dynamic query evaluation plans

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Index scans using a finite LRU buffer: a validated I/O model

ACM Transactions on Database Systems (TODS)
Partition Strategy for Distributed Query Processing in Fast Local Networks

IEEE Transactions on Software Engineering
Dynamic distributed query processing techniques

CSC '89 Proceedings of the 17th conference on ACM Annual Computer Science Conference
Event-join optimization in temporal relational databases

VLDB '89 Proceedings of the 15th international conference on Very large data bases
A performance evaluation of pointer-based joins

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On the propagation of errors in the size of join results

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of database processing or a passing fad?

ACM SIGMOD Record - Directions for future database research & development
Optimal histograms for limiting worst-case error propagation in the size of join results

ACM Transactions on Database Systems (TODS)
On the signature weight in “multiple” m signature files

ACM SIGIR Forum
An overview of query optimization in relational systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Cost-based query scrambling for initial delays

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization techniques for queries with expensive methods

ACM Transactions on Database Systems (TODS)
A tool for performance evaluation of database systems for small computer systems

SAC '95 Proceedings of the 1995 ACM symposium on Applied computing
Correcting execution of distributed queries

DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

ACM Transactions on Information Systems (TOIS)
Toward Practical Query Evaluation for Constraint Databases

Constraints
Deciding to Correct Distributed Query Processing

IEEE Transactions on Knowledge and Data Engineering
Performance Issues in Distributed Query Processing

IEEE Transactions on Parallel and Distributed Systems
Learning Transformation Rules for Semantic Query Optimization: A Data-Driven Approach

IEEE Transactions on Knowledge and Data Engineering
An Observation on Database Buffering Performance Metrics

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
R* Optimizer Validation and Performance Evaluation for Distributed Queries

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
An Analytical Method for Estimating and Interpreting Query Time

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
A Formal Model of Trade-off between Optimization and Execution Costs in Semantic Query Optimization

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Parametric Query Optimization

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Toward Practical Constraint Databases

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Constructing Inter-relational Rules for Semantic Query Optimisation

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Parametric query optimization

The VLDB Journal — The International Journal on Very Large Data Bases
Sing the truth about ad hoc join costs

The VLDB Journal — The International Journal on Very Large Data Bases
Optimizing Cyclic Join View Maintenance over Distributed Data Sources

IEEE Transactions on Knowledge and Data Engineering
Bloom histogram: path selectivity estimation for XML data with updates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Network-aware query processing for stream-based applications

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
On the production of anorexic plan diagrams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Identifying robust plans through plan diagram reduction

Proceedings of the VLDB Endowment
Optimizing Distributed Joins with Bloom Filters

ICDCIT '08 Proceedings of the 5th International Conference on Distributed Computing and Internet Technology
Efficient peer-to-peer keyword searching

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Towards materialized view selection for distributed databases

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Query optimizers: time to rethink the contract?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A fast transformation method to semantic query optimisation

IDEAS'97 Proceedings of the 1997 international conference on International database engineering and applications symposium
Testing the accuracy of query optimizers

DBTest '12 Proceedings of the Fifth International Workshop on Testing Database Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Few database query optimizer models have been validated against actual performance. This paper presents the methodology and results of a thorough validation of the optimizer and evaluation of the performance of the experimental distributed relational database management system R*, which inherited and extended to a distributed environment the optimization algorithms of System R. Optimizer estimated costs and actual R* resources consumed were written to database tables using new SQL commands, permitting automated control from SQL application programs of test data collection and reduction. A number of tests were run over a wide variety of dynamically-created test databases, SQL queries, and system parameters. The results for single-table access, sorting, and local 2-table joins are reported here. The tests confirmed the accuracy of the majority of the I/O cost model, the significant contribution of CPU cost to total cost, and the need to model CPU cost in more detail than was done in System R. The R* optimizer now retains cost components separately and estimates the number of CPU instructions, including those for applying different kinds of predicates. The sensitivity of I/O cost to buffer space motivated the development of more detailed models of buffer utilization unclustered index scans and nested-loop joins often benefit from pages remaining in the buffers, whereas concurrent scans of the data pages and the index pages for multiple tables during joins compete for buffer share. Without an index on the join column of the inner table, the optimizer correctly avoids the nested-loop join, confirming the need for merge-scan joins. When the join column of the inner is indexed, the optimizer overestimates the cost of the nested-loop join, whose actual performance is very sensitive to three parameters that are extremely difficult to estimate (1) the join (result) cardinality, (2) the outer table's cardinality, and (3) the number of buffer pages available to store the inner table. Suggestions are given for improved database statistics, prefetch and page replacement strategies for the buffer manager, and the use of temporary indexes and Bloom filters (hashed semijoins) to reduce access of unneeded data.