Work stealing for multi-core HPC clusters

Authors:
Kaushik Ravichandran;Sangho Lee;Santosh Pande
Affiliations:
College of Computing, Georgia Institute of Technology;College of Computing, Georgia Institute of Technology;College of Computing, Georgia Institute of Technology
Venue:
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Year:
2011

Citing 10
Cited 5

Optimal static load balancing in distributed computer systems

Journal of the ACM (JACM)
An Algorithm for Optimal Static Load Balancing in Distributed Computer Systems

IEEE Transactions on Computers
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
Static scheduling algorithms for allocating directed task graphs to multiprocessors

ACM Computing Surveys (CSUR)
Efficient load balancing for wide-area divide-and-conquer applications

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Scalable Dynamic Load Balancing Using UPC

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
UTS: an unbalanced tree search benchmark

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing

Dynamic distributed scheduling algorithm for state space search

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Using load information in work-stealing on distributed systems with non-uniform communication latencies

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A work-stealing scheduling framework supporting fault tolerance

Proceedings of the Conference on Design, Automation and Test in Europe
How to be a successful thief: feudal work stealing for irregular divide-and-conquer applications on heterogeneous distributed systems

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Load balancing non-uniform parallel computations

Proceedings of the 2013 workshop on Programming based on actors, agents, and decentralized control

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today a significant fraction of HPC clusters are built from multi-core machines connected via a high speed interconnect, hence, they have a mix of shared memory and distributed memory. Work stealing algorithms are currently designed for either a shared memory architecture or for a distributed memory architecture and are extended to work on these multi-core clusters by assuming a single underlying architecture. However, as the number of cores in each node increase, the differences between a shared memory architecture and a distributed memory architecture become more acute. Current work stealing approaches are not suitable for multi-core clusters due to the dichotomy of the underlying architecture. We combine the best aspects of both the current approaches in to a new algorithm. Our algorithm allows for more efficient execution of large-scale HPC applications, such as UTS, on clusters which have large multi-cores. As the number of cores per node increase, which is inevitable given today's processor trends, such an approach is crucial.