Efficient load balancing for wide-area divide-and-conquer applications

Authors:
Rob V. van Nieuwpoort;Thilo Kielmann;Henri E. Bal
Affiliations:
Faculty of Sciences, Division of Mathematics and Computer Science, Vrije Universiteit, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands;Faculty of Sciences, Division of Mathematics and Computer Science, Vrije Universiteit, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands;Faculty of Sciences, Division of Mathematics and Computer Science, Vrije Universiteit, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands
Venue:
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Year:
2001

Citing 17
Cited 42

A comparison of receiver-initiated and sender-initiated adaptive load sharing

Performance Evaluation
Flagship: a parallel architecture for declarative programming

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Lazy task creation: a technique for increasing the granularity of parallel programs

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme

IEEE Transactions on Software Engineering
Communication complexity for parallel divide-and-conquer

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance evaluation of the Orca shared-object system

ACM Transactions on Computer Systems (TOCS)
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
An efficient implementation of Java's remote method invocation

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A Java fork/join framework

Proceedings of the ACM 2000 conference on Java Grande
ATLAS: an infrastructure for global computing

EW 7 Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems support for worldwide applications
Load Distributing for Locally Distributed Systems

Computer
User-Level Network Interface Protocols

Computer
Economic-Based Dynamic Load Distribution in Large Workstation Networks

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
ALICE a multi-processor reduction machine for the parallel evaluation CF applicative languages

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Sensitivity of Parallel Applications to Large Differences in Bandwidth and Latency in Two-Layer Interconnects

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Adaptive and reliable parallel computing on networks of workstations

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference

Ibis: an efficient Java-based grid programming environment

JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Programming environments for high-performance grid computing: the Albatross project

Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
Fault-Tolerance, Malleability and Migration for Divide-and-Conquer Applications on the Grid

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The Virtual Instrument: Support for Grid-Enabled Mcell Simulations

International Journal of High Performance Computing Applications
Self-adaptive applications on the grid

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Heterogeneity-Aware Workload Distribution in Donation-Based Grids

International Journal of High Performance Computing Applications
Adaptive Allocation of Independent Tasks to Maximize Throughput

IEEE Transactions on Parallel and Distributed Systems
WSPE: a peer-to-peer programming environment for grid-unaware applications

Proceedings of the 5th international workshop on Middleware for grid computing: held at the ACM/IFIP/USENIX 8th International Middleware Conference
gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments

Languages and Compilers for Parallel Computing
Backtracking-based load balancing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Maestro: a self-organizing peer-to-peer dataflow framework using reinforcement learning

Proceedings of the 18th ACM international symposium on High performance distributed computing
Dynamic load balancing efficiently in a large-scale cluster

International Journal of High Performance Computing and Networking
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Satin: A high-level and efficient grid programming model

ACM Transactions on Programming Languages and Systems (TOPLAS)
Skeletons for divide and conquer algorithms

PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
A transparent framework for hierarchical master-slave grid computing

Euro-Par'06 Proceedings of the CoreGRID 2006, UNICORE Summit 2006, Petascale Computational Biology and Bioinformatics conference on Parallel processing
Hierarchical master-worker skeletons

PADL'08 Proceedings of the 10th international conference on Practical aspects of declarative languages
A mean field model of work stealing in large-scale systems

Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Granularity-Aware Work-Stealing for Computationally-Uniform Grids

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Scalable hardware support for conditional parallelization

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hierarchical work-stealing

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Lifeline-based global load balancing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Dynamic workload balancing deques for branch and bound algorithms in the message passing interface

International Journal of High Performance Systems Architecture
Work stealing for multi-core HPC clusters

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Generating synchronization statements in divide-and-conquer programs

Parallel Computing
An efficient dynamic load-balancing algorithm in a large-scale cluster

ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Tuning application in a multi-cluster environment

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Supporting reconfigurable parallel multimedia applications

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Towards high-level grid programming and load-balancing: a Barnes-hut case study

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Developing java grid applications with ibis

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Towards a bulk-synchronous distributed shared memory programming environment for grids

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Dynamic parallelization of grid–enabled web services

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
A combined hyperdatabase and grid infrastructure for data stream management and digital library processes

DELOS'04 Proceedings of the 6th Thematic conference on Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures
BWS: balanced work stealing for time-sharing multicores

Proceedings of the 7th ACM european conference on Computer Systems
Work stealing and persistence-based load balancers for iterative overdecomposed applications

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
XCS-based versus UCS-based feature pattern classification system

Proceedings of the 14th annual conference on Genetic and evolutionary computation
Persistent fault-tolerance for divide-and-conquer applications on the grid

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Using load information in work-stealing on distributed systems with non-uniform communication latencies

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
An architecture for P2P bag-of-tasks execution with multiple task allocation policies in desktop grids

Cluster Computing
Design and implementation of a customizable work stealing scheduler

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
How to be a successful thief: feudal work stealing for irregular divide-and-conquer applications on heterogeneous distributed systems

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
GLB: lifeline-based global load balancing library in x10

Proceedings of the first workshop on Parallel programming for analytics applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Divide-and-conquer programs are easily parallelized by letting the programmer annotate potential parallelism in the form of spawn and sync constructs. To achieve efficient program execution, the generated work load has to be balanced evenly among the available CPUs. For single cluster systems, Random Stealing (RS) is known to achieve optimal load balancing. However, RS is inefficient when applied to hierarchical wide-area systems where multiple clusters are connected via wide-area networks (WANs) with high latency and low bandwidth.In this paper, we experimentally compare RS with existing load-balancing strategies that are believed to be efficient for multi-cluster systems, Random Pushing and two variants of Hierarchical Stealing. We demonstrate that, in practice, they obtain less than optimal results. We introduce a novel load-balancing algorithm, Cluster-aware Random Stealing (CRS) which is highly efficient and easy to implement. CRS adapts itself to network conditions and job granularities, and does not require manually-tuned parameters. Although CRS sends more data across the WANs, it is faster than its competitors for 11 out of 12 test applications with various WAN configurations. It has at most 4% overhead in run time compared to RS on a single, large cluster, even with high wide-area latencies and low wide-area bandwidths. These strong results suggest that divide-and-conquer parallelism is a useful model for writing distributed supercomputing applications on hierarchical wide-area systems.