Dynamic workload balancing deques for branch and bound algorithms in the message passing interface

Authors:
Stefano Mor;Nicolas Maillard
Affiliations:
Informatics Institute, Federal University of Rio Grande do Sul, Av. Bento Goncalves 9500, Porto Alegre, RS, Brazil.;Informatics Institute, Federal University of Rio Grande do Sul, Av. Bento Goncalves 9500, Porto Alegre, RS, Brazil
Venue:
International Journal of High Performance Systems Architecture
Year:
2011

Citing 21
Cited 0

Lazy task creation: a technique for increasing the granularity of parallel programs

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Communication complexity for parallel divide-and-conquer

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Parallel programming with MPI

Parallel programming with MPI
Space-efficient scheduling of nested parallelism

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient load balancing for wide-area divide-and-conquer applications

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
The STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library

The STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library
Efficient Parallel Divide-and-Conquer for a Class of Interconnection Topologies

ISA '91 Proceedings of the 2nd International Symposium on Algorithms
Implementation of multilisp: Lisp on a multiprocessor

LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Executing functional programs on a virtual tree of processors

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Synchronized MIMD Computing

Synchronized MIMD Computing
Performance evaluation of adaptive MPI

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive and reliable parallel computing on networks of workstations

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Intel® threading building blocks

Journal of Computing Sciences in Colleges
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Deque-Free Work-Optimal Parallel STL Algorithms

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Map-reduce as a Programming Model for Custom Computing Machines

FCCM '08 Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing Machines
A view of the parallel computing landscape

Communications of the ACM - A View of Parallel Computing
Designing efficient sorting algorithms for manycore GPUs

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Provably efficient two-level adaptive scheduling

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Online mapping of MPI-2 dynamic tasks to processes and threads

International Journal of High Performance Systems Architecture
Improving the dynamic creation of processes in MPI-2

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

The message passing interface (MPI) is the standard in message passing parallel computation. MPI does not provide a canonical way to dynamically distribute run-time generated workload evenly across all the participating computer nodes. This paper proposes a MPI library to provide near-optimal dynamical workload balancing over branch and bound (B&B) algorithms; B&B potentially produces huge workload unbalance during a parallel execution. The library, named RaWSDM, provides a double ended queue (deque) data structure on which the programmer may pop, process, and later, pull back some parallel tasks; an underlying efficient system scheduler is responsible for keeping the workload balanced by exchanging elements among all deques. Theoretical bounds are traced and practical experiments are performed with the unlimited knapsack problem. Results show a performance gain up to 80% (best-case scenario) against a pure MPI implementation using round-robin scheduling, with near linear speedup and memory consumption.