Progressive optimization in a shared-nothing parallel database

Authors:
Wook-Shin Han;Jack Ng;Volker Markl;Holger Kache;Mokhtar Kandil
Affiliations:
Kyungpook National University, Daegu, South Korea;IBM Toronto Lab, Toronto, Canada;IBM Almaden Research Center, San Jose, CA;IBM Silicon Valley Lab, San Jose, CA;IBM Toronto Lab, Toronto, Canada
Venue:
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Year:
2007

Citing 10
Cited 6

Open issues in parallel query optimization

ACM SIGMOD Record
Dynamic Query Operator Scheduling for Wide-Area Remote Access

Distributed and Parallel Databases
Partial results for online query processing

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Order Based Analysis Functions in NCR Teradata Parallel RDBMS

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Efficient Testing of High Performance Transaction Processing Systems

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Dynamic Load Balancing in Hierarchical Parallel Database Systems

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On Optimal Pipeline Processing in Parallel Query Execution

On Optimal Pipeline Processing in Parallel Query Execution
Robust query processing through progressive optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Lifting the burden of history from adaptive query processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Progressive optimization in action

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Parallelizing query optimization

Proceedings of the VLDB Endowment
Time-completeness trade-offs in record linkage using adaptive query processing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Dynamic query optimisation: towards decentralised methods

International Journal of Intelligent Information and Database Systems
Query processing in a DBMS for cluster systems

Programming and Computing Software
Query evaluation techniques for cluster database systems

ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
A mobile relational algebra

Mobile Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Commercial enterprise data warehouses are typically implemented on parallel databases due to the inherent scalability and performance limitation of a serial architecture. Queries used in such large data warehouses can contain complex predicates as well as multiple joins, and the resulting query execution plans generated by the optimizer may be sub-optimal due to mis-estimates of row cardinalities. Progressive optimization (POP) is an approach to detect cardinality estimation errors by monitoring actual cardinalities at run-time and to recover by triggering re-optimization with the actual cardinalities measured. However, the original serial POP solution is based on a serial processing architecture, and the core ideas cannot be readily applied to a parallel shared-nothing environment. Extending the serial POP to a parallel environment is a challenging problem since we need to determine when and how we can trigger re-optimization based on cardinalities collected from multiple independent nodes. In this paper, we present a comprehensive and practical solution to this problem, including several novel voting schemes whether to trigger re-optimization, a mechanism to reuse local intermediate results across nodes as a partitioned materialized view, several flavors of parallel checkpoint operators, and parallel checkpoint processing methods using efficient communication protocols. This solution has been prototyped in a leading commercial parallel DBMS. We have performed extensive experiments using the TPC-H benchmark and a real-world database. Experimental results show that our solution has negligible runtime overhead and accelerates the performance of complex OLAP queries by up to a factor of 22.