On estimating the cardinality of the projection of a database relation
ACM Transactions on Database Systems (TODS)
Dynamic query evaluation plans
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Multiple join size estimation by virtual domains (extended abstract)
PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the estimation of join result sizes
EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Efficient mid-query re-optimization of sub-optimal query execution plans
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Cost-based query scrambling for initial delays
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems
Distributed and Parallel Databases
Implications of certain assumptions in database performance evauation
ACM Transactions on Database Systems (TODS)
Answering complex SQL queries using automatic summary tables
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Partial results for online query processing
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Optimizing Queries Across Diverse Data Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Query Optimization in a Heterogeneous DBMS
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
LEO - DB2's LEarning Optimizer
Proceedings of the 27th International Conference on Very Large Data Bases
Calibrating the Query Optimizer Cost Model of IRO-DB, an Object-Oriented Federated Database System
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Query processing and optimization in Oracle Rdb
The VLDB Journal — The International Journal on Very Large Data Bases
Mariposa: A New Architecture for Distributed Data
Mariposa: A New Architecture for Distributed Data
Adapting to source properties in processing data integration queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Robust query processing through progressive optimization
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Consistently estimating the selectivity of conjuncts of predicates
VLDB '05 Proceedings of the 31st international conference on Very large data bases
POP/FED: progressive query optimization for federated queries in DB2
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Foundations and Trends in Databases
Time-completeness trade-offs in record linkage using adaptive query processing
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Database Management Systems (DBMS) perform query plan selection by mathematically modeling the execution cost of candidate execution plans and choosing the cheapest query execution plan (QEP) according to that cost model. The cost model requires accurate estimates of the sizes of intermediate results of all steps in the QEP. Outdated or incomplete statistics, parameter markers and complex skewed data frequently cause the selection of a suboptimal query plan, which in turn results in bad query performance. Federated queries are regular relational queries accessing data on one or more remote relational or non-relational data sources, possibly combining them with tables stored in the federated DBMS server. Their execution is typically divided between the federated server and the remote data sources. Outdated and incomplete statistics have a bigger impact on federated DBMS than on regular DBMS, as maintenance of federated statistics is unequally more complicated and expensive than the maintenance of the local statistics; consequently bad performance commonly occurs for federated queries due to the selection of a suboptimal query plan. We present an extension of the mid-query reoptimization technique "Progressive Query Optimization" (POP), which adds robustness to query processing by dynamically detecting if an access plan is suboptimal and by triggering a reoptimization in that case. Our extensions enable efficient reoptimization of federated queries. Our contributions are (a) an opportunistic, but risk controlled, reoptimization technique for federated DBMS (b) a technique for multiple reoptimizations during federated query processing, with a strategy to discover redundant and eliminate partial results and (c) a mechanism to eagerly procure statistics in a federated environment. We have implemented these techniques in a prototype version of WebSphere Information Integrator for DB2. Our enhancements enable robust and acceptable performance for federated queries, even if the remote data sources provided almost no statistical information about the data. An extensive case study on real world data shows POP has negligible runtime overhead and improves the performance of complex federated queries by up to a full order of magnitude.