Scalable Dynamic Load Balancing Using UPC

Authors:
Stephen Olivier;Jan Prins
Affiliations:
-;-
Venue:
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Year:
2008

Citing 0
Cited 15

Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Dynamically scaling applications in the cloud

ACM SIGCOMM Computer Communication Review
Lifeline-based global load balancing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Juggle: proactive load balancing on multicore computers

Proceedings of the 20th international symposium on High performance distributed computing
Unbalanced tree search on a manycore system using the GPI programming model

Computer Science - Research and Development
Work stealing for multi-core HPC clusters

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Efficient data race detection for distributed memory parallel programs

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Shared work list: hacking amorphous data parallelism in UPC

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
A study on scalability of services and privacy issues in cloud computing

ICDCIT'12 Proceedings of the 8th international conference on Distributed Computing and Internet Technology
Dynamic distributed scheduling algorithm for state space search

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Scaling data race detection for partitioned global address space programs

Proceedings of the 27th international ACM conference on International conference on supercomputing
Juggle: addressing extrinsic load imbalances in SPMD applications on multicore computers

Cluster Computing
Load balancing non-uniform parallel computations

Proceedings of the 2013 workshop on Programming based on actors, agents, and decentralized control
X10 and APGAS at Petascale

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
GLB: lifeline-based global load balancing library in x10

Proceedings of the first workshop on Parallel programming for analytics applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

An asynchronous work-stealing implementation of dynamic load balance is implemented using Unified Parallel C (UPC) and evaluated using the Unbalanced Tree Search(UTS) benchmark. The UTS benchmark presents a synthetic tree-structured search space that is highly imbalanced. Parallel implementation of the search requires continuous dynamic load balancing to keep all processors engaged in the search. Our implementation achieves better scaling and parallel efficiency in both shared memory and distributed memory settings than previous efforts using UPC and MPI. We observe parallel efficiency of 80% using 1024 processors performing over 85,000 total load balancing operations per second continuously. The UPC programming model provides substantial simplifications in the expression of the asynchronous work stealing protocol compared with MPI. However, to obtain performance portability with UPC in both shared memory and distributed memory settings requires the careful use of onesided reads and writes to minimize the impact of high latency communication. Additional protocol improvements are made to improve dissemination of available work and to decrease the cost of termination detection.