Adaptive load sharing in homogeneous distributed systems
IEEE Transactions on Software Engineering
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Proceedings of the ACM 2000 conference on Java Grande
The data locality of work stealing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Supporting task migration in multi-processor systems-on-chip: a feasibility study
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Journal of Parallel and Distributed Computing
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Scheduling multithreaded computations by work stealing
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Intel threading building blocks
Intel threading building blocks
The design of a task parallel library
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
PFunc: modern task parallelism for modern high performance computing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Lifeline-based global load balancing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Using the Cowichan problems to investigate the programmability of X10 programming system
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Scheduling parallel programs by work stealing with private deques
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
This paper presents a hybrid parallel task-placement strategy that combines work stealing and work dealing to improve workload distribution across nodes in distributed shared-memory machines. Existing work-dealing-based load balancers suffer from large performance penalties resulting from excessive task migration and from excessive communication among the nodes to determine the target node for a migrated task. This work employs a simple heuristic to determine the load status of a node and also to detect a good target for migration of tasks. Experimental evaluations on applications chosen from the Cowichan and Lonestar suites demonstrate a speedup, with the proposed approach, in the range of 2% to 16% on a cluster of 128 cores over the state-of-the-art work-stealing scheduler.