Parallel querying with non-dedicated computers

Authors:
Vijayshankar Raman;Wei Han;Inderpal Narang
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 18
Cited 9

Encapsulation of parallelism in the Volcano query processing system

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems

Communications of the ACM
DB2 parallel edition

IBM Systems Journal
Much ado about shared-nothing

ACM SIGMOD Record
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimization of parallel query execution plans in XPRS

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Garlic: a new flavor of federated query processing for DB2

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Prototyping Bubba, A Highly Parallel Database System

IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
Dynamic and Load-balanced Task-Oriented Datbase Query Processing in Parallel Systems

EDBT '92 Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology
An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew

Proceedings of the Seventh International Conference on Data Engineering
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
NCR 3700 - The Next-Generation Industrial Database Computer

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Dynamic Multi-Resource Load Balancing in Parallel Database Systems

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Managing Intra-operator Parallelism in Parallel Database Systems

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Highly available, fault-tolerant, parallel dataflows

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Resource Scheduling for Parallel Query Processing on Computational Grids

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Tuple routing strategies for distributed eddies

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Adaptive workload allocation in query processing in autonomous heterogeneous environments

Distributed and Parallel Databases
Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options

The VLDB Journal — The International Journal on Very Large Data Bases
Automation everywhere: autonomics and data management

BNCOD'07 Proceedings of the 24th British national conference on Databases
Load-balancing for WAN warehouses

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Run-time adaptivity for search computing

Search computing
An efficient skew-insensitive algorithm for join processing on grid architectures

Proceedings of the fifth international workshop on High-level parallel programming and applications
Efficient load balancing in partitioned queries under random perturbations

ACM Transactions on Autonomous and Adaptive Systems (TAAS) - Special section on formal methods in pervasive computing, pervasive adaptation, and self-adaptive systems: Models and algorithms
Utility-driven adaptive query workload execution

Future Generation Computer Systems
Just-in-time data distribution for analytical query processing

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present DITN, a new method of parallel querying based on dynamic outsourcing of join processing tasks to non-dedicated, heterogeneous computers. In DITN, partitioning is not the means of parallelism. Data layout decisions are taken outside the scope of the DBMS, and handled within the storage software; query processors see a "Data In The Network" image. This allows gradual scaleout as the workload grows, by using non-dedicated computers.A typical operator in a parallel query plan is Exchange [7]. We argue that Exchange is unsuitable for non-dedicated machines because it poorly addresses node heterogeneity, and is vulnerable to failures or load spikes during query execution. DITN uses an alternate intra-fragment parallelism where each node executes an independent select-project-join-aggregate-group by block, with no tuple exchange between nodes. This method cleanly handles heterogeneous nodes, and well adapts during execution to node failures or load spikes.Initial experiments suggest that DITN performs competitively with a traditional configuration of dedicated machines and well-partitioned data for up to 10 processors at least. At the same time, DITN gives significant flexibility in terms of gradual scaleout and handling of heterogeneity, load bursts, and failures.