High-performance computer architecture
High-performance computer architecture
Complexity of scheduling parallel task systems
SIAM Journal on Discrete Mathematics
Parallel database systems: the future of high performance database systems
Communications of the ACM
A heuristic of scheduling parallel tasks and its analysis
SIAM Journal on Computing
Query optimization for parallel execution
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Exploiting inter-operation parallelism in XPRS
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Approximate algorithms scheduling parallelizable tasks
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
On optimal processor allocation to support pipelined hash joins
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
On resource scheduling of multi-join queries in parallel database systems
Information Processing Letters
On parallel execution of multiple pipelined hash joins
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Scheduling problems in parallel query optimization
PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Parallel evaluation of multi-join queries
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
IEEE Transactions on Computers
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
On the Granularity and Clustering of Directed Acyclic Task Graphs
IEEE Transactions on Parallel and Distributed Systems
Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries
Proceedings of the Eighth International Conference on Data Engineering
Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Parallelism in a Main-Memory DBMS: The Performance of PRISMA/DB
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Dynamic Memory Allocation for Multiple-Query Workloads
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Dynamic Multi-Resource Load Balancing in Parallel Database Systems
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Managing Intra-operator Parallelism in Parallel Database Systems
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A probabilistic analysis of multidimensional bin packing problems
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Complex query processing in multiprocessor database machines
Complex query processing in multiprocessor database machines
Resource scheduling for parallel database and scientific applications
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
On multi-dimensional packing problems
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Load Balancing for Parallel Query Execution on NUMA Multiprocessors
Distributed and Parallel Databases
Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Dynamic Load Balancing in Hierarchical Parallel Database Systems
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Memory Aware Query Routing in Interactive Web-Based Information Systems
BNCOD 18 Proceedings of the 18th British National Conference on Databases: Advances in Databases
Pipelined operator tree scheduling in heterogeneous environments
Journal of Parallel and Distributed Computing
Analytical response time estimation in parallel relational database systems
Parallel Computing
Revisiting pipelined parallelism in multi-join query processing
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Providing resiliency to load variations in distributed stream processing
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Automatic virtual machine configuration for database workloads
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Evolution of Query Optimization Methods: From Centralized Database Systems to Data Grid Systems
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Automatic virtual machine configuration for database workloads
ACM Transactions on Database Systems (TODS)
A new look at generating multi-join continuous query plans: A qualified plan generation problem
Data & Knowledge Engineering
Multiple query scheduling for distributed semantic caches
Journal of Parallel and Distributed Computing
A survey of issues of query optimization in parallel databases
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Hi-index | 0.00 |
Scheduling query execution plans is an important component of query optimization in parallel database systems. The problem is particularly complex in a shared-nothing execution environment, where each system node represents a collection of time-shareable resources (e.g., CPU(s), disk(s), etc.) and communicates with other nodes only by message-passing. Significant research effort has concentrated on only a subset of the various forms of intra-query parallelism so that scheduling and synchronization is simplified. In addition, most previous work has focused its attention on one-dimensional models of parallel query scheduling, effectively ignoring the potential benefits of resource sharing. In this paper, we develop an approach that is more general in both directions, capturing all forms of intra-query parallelism and exploiting sharing of multi-dimensional resource nodes among concurrent plan operators. This allows scheduling a set of independent query tasks (i.e., operator pipelines) to be seen as an instance of the multi-dimensional bin-design problem. Using a novel quantification of coarse grain parallelism, we present a list scheduling heuristic algorithm that is provably near-optimal in the class of coarse grain parallel executions (with a worst-case performance ratio that depends on the number of resources per node and the granularity parameter). We then extend this algorithm to handle the operator precedence constraints in a bushy query plan by splitting the execution of the plan into synchronized phases. Preliminary performance results confirm the effectiveness of our scheduling algorithm compared both to previous approaches and the optimal solution. Finally, we present a technique that allows us to relax the coarse granularity restriction and obtain a list scheduling method that is provably near-optimal in the space of all possible parallel schedules.