A comparison of receiver-initiated and sender-initiated adaptive load sharing
Performance Evaluation
Flagship: a parallel architecture for declarative programming
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme
IEEE Transactions on Software Engineering
Communication complexity for parallel divide-and-conquer
SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS)
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
An efficient implementation of Java's remote method invocation
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the ACM 2000 conference on Java Grande
ATLAS: an infrastructure for global computing
EW 7 Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems support for worldwide applications
Economic-Based Dynamic Load Distribution in Large Workstation Networks
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
ALICE a multi-processor reduction machine for the parallel evaluation CF applicative languages
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Ibis: an efficient Java-based grid programming environment
JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Programming environments for high-performance grid computing: the Albatross project
Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
Fault-Tolerance, Malleability and Migration for Divide-and-Conquer Applications on the Grid
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The Virtual Instrument: Support for Grid-Enabled Mcell Simulations
International Journal of High Performance Computing Applications
Self-adaptive applications on the grid
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Heterogeneity-Aware Workload Distribution in Donation-Based Grids
International Journal of High Performance Computing Applications
Adaptive Allocation of Independent Tasks to Maximize Throughput
IEEE Transactions on Parallel and Distributed Systems
WSPE: a peer-to-peer programming environment for grid-unaware applications
Proceedings of the 5th international workshop on Middleware for grid computing: held at the ACM/IFIP/USENIX 8th International Middleware Conference
gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments
Languages and Compilers for Parallel Computing
Backtracking-based load balancing
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Maestro: a self-organizing peer-to-peer dataflow framework using reinforcement learning
Proceedings of the 18th ACM international symposium on High performance distributed computing
Dynamic load balancing efficiently in a large-scale cluster
International Journal of High Performance Computing and Networking
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Satin: A high-level and efficient grid programming model
ACM Transactions on Programming Languages and Systems (TOPLAS)
Skeletons for divide and conquer algorithms
PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
A transparent framework for hierarchical master-slave grid computing
Euro-Par'06 Proceedings of the CoreGRID 2006, UNICORE Summit 2006, Petascale Computational Biology and Bioinformatics conference on Parallel processing
Hierarchical master-worker skeletons
PADL'08 Proceedings of the 10th international conference on Practical aspects of declarative languages
A mean field model of work stealing in large-scale systems
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Granularity-Aware Work-Stealing for Computationally-Uniform Grids
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Scalable hardware support for conditional parallelization
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Lifeline-based global load balancing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Dynamic workload balancing deques for branch and bound algorithms in the message passing interface
International Journal of High Performance Systems Architecture
Work stealing for multi-core HPC clusters
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Generating synchronization statements in divide-and-conquer programs
Parallel Computing
An efficient dynamic load-balancing algorithm in a large-scale cluster
ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Tuning application in a multi-cluster environment
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Supporting reconfigurable parallel multimedia applications
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Towards high-level grid programming and load-balancing: a Barnes-hut case study
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Developing java grid applications with ibis
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Towards a bulk-synchronous distributed shared memory programming environment for grids
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Dynamic parallelization of grid–enabled web services
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
DELOS'04 Proceedings of the 6th Thematic conference on Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures
BWS: balanced work stealing for time-sharing multicores
Proceedings of the 7th ACM european conference on Computer Systems
Work stealing and persistence-based load balancers for iterative overdecomposed applications
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
XCS-based versus UCS-based feature pattern classification system
Proceedings of the 14th annual conference on Genetic and evolutionary computation
Persistent fault-tolerance for divide-and-conquer applications on the grid
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Design and implementation of a customizable work stealing scheduler
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
GLB: lifeline-based global load balancing library in x10
Proceedings of the first workshop on Parallel programming for analytics applications
Hi-index | 0.00 |
Divide-and-conquer programs are easily parallelized by letting the programmer annotate potential parallelism in the form of spawn and sync constructs. To achieve efficient program execution, the generated work load has to be balanced evenly among the available CPUs. For single cluster systems, Random Stealing (RS) is known to achieve optimal load balancing. However, RS is inefficient when applied to hierarchical wide-area systems where multiple clusters are connected via wide-area networks (WANs) with high latency and low bandwidth.In this paper, we experimentally compare RS with existing load-balancing strategies that are believed to be efficient for multi-cluster systems, Random Pushing and two variants of Hierarchical Stealing. We demonstrate that, in practice, they obtain less than optimal results. We introduce a novel load-balancing algorithm, Cluster-aware Random Stealing (CRS) which is highly efficient and easy to implement. CRS adapts itself to network conditions and job granularities, and does not require manually-tuned parameters. Although CRS sends more data across the WANs, it is faster than its competitors for 11 out of 12 test applications with various WAN configurations. It has at most 4% overhead in run time compared to RS on a single, large cluster, even with high wide-area latencies and low wide-area bandwidths. These strong results suggest that divide-and-conquer parallelism is a useful model for writing distributed supercomputing applications on hierarchical wide-area systems.