Parallelizing query optimization

Authors:
Wook-Shin Han;Wooseong Kwak;Jinsoo Lee;Guy M. Lohman;Volker Markl
Affiliations:
Kyungpook National University, Republic of Korea;Kyungpook National University, Republic of Korea;Kyungpook National University, Republic of Korea;IBM Almaden Research Center, San Jose, California;IBM Almaden Research Center, San Jose, California
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 28
Cited 8

Optimization of large join queries

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Grammar-like functional rules for representing query optimization alternatives

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Randomized algorithms for optimizing large join queries

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Tabu search optimization of large join queries

EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Parallelism and its price: a case study of nonstop SQL/MP

ACM SIGMOD Record
Database Management Systems

Database Management Systems
Parallel dynamic programming for solving the string editing problem on a CGM/BSP

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles

IEEE Transactions on Parallel and Distributed Systems
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
A new way to compute the product and join of relations

SIGMOD '80 Proceedings of the 1980 ACM SIGMOD international conference on Management of data
Large Join Optimization on a Hypercube Multiprocessor

IEEE Transactions on Knowledge and Data Engineering
Parallel Optimization of Large Join Queries with Set Operators and Aggregates in a Parallel Environment Supporting Pipeline

IEEE Transactions on Knowledge and Data Engineering
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Measuring the Complexity of Join Enumeration in Query Optimization

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Cache Coherency in Oracle Parallel Server

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficient processing of joins on set-valued attributes

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Estimating compilation time of a query optimizer

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust query processing through progressive optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Efficient set joins on similarity predicates

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adaptive load shedding for windowed stream joins

Proceedings of the 14th ACM international conference on Information and knowledge management
Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Optimal top-down join enumeration

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Progressive optimization in a shared-nothing parallel database

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A parallel dynamic programming algorithm on a multi-core architecture

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Dynamic programming strikes back

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Is (your) database research having impact?

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications

Dependency-aware reordering for parallelizing query optimization in multi-core CPUs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Parallelizing extensible query optimizers

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A Query Cache Tool for Optimizing Repeatable and Parallel OLAP Queries

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Parallel skyline computation on multicore architectures

Information Systems
Optimizing analytic data flows for multiple execution engines

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Optimization of analytic data flows for next generation business intelligence applications

TPCTC'11 Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and Characterization
An evolutionary multi-agent system for database query optimization

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Hybrid Analytic Flows-the Case for Optimization

Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many commercial RDBMSs employ cost-based query optimization exploiting dynamic programming (DP) to efficiently generate the optimal query execution plan. However, optimization time increases rapidly for queries joining more than 10 tables. Randomized or heuristic search algorithms reduce query optimization time for large join queries by considering fewer plans, sacrificing plan optimality. Though commercial systems executing query plans in parallel have existed for over a decade, the optimization of such plans still occurs serially. While modern microprocessors employ multiple cores to accelerate computations, parallelizing query optimization to exploit multi-core parallelism is not as straightforward as it may seem. The DP used in join enumeration belongs to the challenging nonserial polyadic DP class because of its non-uniform data dependencies. In this paper, we propose a comprehensive and practical solution for parallelizing query optimization in the multi-core processor architecture, including a parallel join enumeration algorithm and several alternative ways to allocate work to threads to balance their load. We also introduce a novel data structure called skip vector array to significantly reduce the generation of join partitions that are infeasible. This solution has been prototyped in PostgreSQL. Extensive experiments using various query graph topologies confirm that our algorithms allocate the work evenly, thereby achieving almost linear speed-up. Our parallel join enumeration algorithm enhanced with our skip vector array outperforms the conventional generate-and-filter DP algorithm by up to two orders of magnitude for star queries-linear speedup due to parallelism and an order of magnitude performance improvement due to the skip vector array.