Hybrid parallel task placement in X10

Authors:
Jeeva Paudel;Olivier Tardieu;José Nelson Amaral
Affiliations:
University of Alberta;IBM T.J. Watson Research Center;University of Alberta
Venue:
Proceedings of the third ACM SIGPLAN X10 Workshop
Year:
2013

Citing 16
Cited 0

Adaptive load sharing in homogeneous distributed systems

IEEE Transactions on Software Engineering
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A Java fork/join framework

Proceedings of the ACM 2000 conference on Java Grande
The data locality of work stealing

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Supporting task migration in multi-processor systems-on-chip: a feasibility study

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Adaptive load distribution algorithms for heterogeneous distributed systems with multiple task classes

Journal of Parallel and Distributed Computing
Adaptive and reliable parallel computing on networks of workstations

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Intel threading building blocks

Intel threading building blocks
The design of a task parallel library

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
PFunc: modern task parallelism for modern high performance computing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Lifeline-based global load balancing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Using the Cowichan problems to investigate the programmability of X10 programming system

Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Scheduling parallel programs by work stealing with private deques

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a hybrid parallel task-placement strategy that combines work stealing and work dealing to improve workload distribution across nodes in distributed shared-memory machines. Existing work-dealing-based load balancers suffer from large performance penalties resulting from excessive task migration and from excessive communication among the nodes to determine the target node for a migrated task. This work employs a simple heuristic to determine the load status of a node and also to detect a good target for migration of tasks. Experimental evaluations on applications chosen from the Cowichan and Lonestar suites demonstrate a speedup, with the proposed approach, in the range of 2% to 16% on a cluster of 128 cores over the state-of-the-art work-stealing scheduler.