Load balancing for regular meshes on SMPs with MPI

Authors:
Vivek Kale;William Gropp
Affiliations:
University of Illinois at Urbana-Champaign, IL;University of Illinois at Urbana-Champaign, IL
Venue:
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Year:
2010

Citing 7
Cited 1

Cilk: an efficient multithreaded runtime system

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms

Journal of Parallel and Distributed Computing
Handling OS jitter on multicore multithreaded systems

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The bottom-up implementation of one MILC lattice QCD application on the cell blade

International Journal of Parallel Programming

Quantifying the potential task-based dataflow parallelism in MPI applications

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Domain decomposition for regular meshes on parallel computers has traditionally been performed by attempting to exactly partition the work among the available processors (now cores). However, these strategies often do not consider the inherent system noise which can hinder MPI application scalability to emerging peta-scale machines with 10000+ nodes. In this work, we suggest a solution that uses a tunable hybrid static/dynamic scheduling strategy that can be incorporated into current MPI implementations of mesh codes. By applying this strategy to a 3D jacobi algorithm, we achieve performance gains of at least 16% for 64 SMP nodes.