Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Communication complexity for parallel divide-and-conquer
SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Parallel programming with MPI
Space-efficient scheduling of nested parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient load balancing for wide-area divide-and-conquer applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
The STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library
The STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library
Efficient Parallel Divide-and-Conquer for a Class of Interconnection Topologies
ISA '91 Proceedings of the 2nd International Symposium on Algorithms
Implementation of multilisp: Lisp on a multiprocessor
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Executing functional programs on a virtual tree of processors
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Synchronized MIMD Computing
Performance evaluation of adaptive MPI
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Intel® threading building blocks
Journal of Computing Sciences in Colleges
Scheduling multithreaded computations by work stealing
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Deque-Free Work-Optimal Parallel STL Algorithms
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Map-reduce as a Programming Model for Custom Computing Machines
FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
A view of the parallel computing landscape
Communications of the ACM - A View of Parallel Computing
Designing efficient sorting algorithms for manycore GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Provably efficient two-level adaptive scheduling
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Online mapping of MPI-2 dynamic tasks to processes and threads
International Journal of High Performance Systems Architecture
Improving the dynamic creation of processes in MPI-2
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Hi-index | 0.00 |
The message passing interface (MPI) is the standard in message passing parallel computation. MPI does not provide a canonical way to dynamically distribute run-time generated workload evenly across all the participating computer nodes. This paper proposes a MPI library to provide near-optimal dynamical workload balancing over branch and bound (B&B) algorithms; B&B potentially produces huge workload unbalance during a parallel execution. The library, named RaWSDM, provides a double ended queue (deque) data structure on which the programmer may pop, process, and later, pull back some parallel tasks; an underlying efficient system scheduler is responsible for keeping the workload balanced by exchanging elements among all deques. Theoretical bounds are traced and practical experiments are performed with the unlimited knapsack problem. Results show a performance gain up to 80% (best-case scenario) against a pure MPI implementation using round-robin scheduling, with near linear speedup and memory consumption.