An integrated runtime scheduler for MPI

Authors:
Humaira Kamal;Alan Wagner
Affiliations:
Dept. of Computer Science, University of British Columbia, Vancouver, Canada;Dept. of Computer Science, University of British Columbia, Vancouver, Canada
Venue:
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Year:
2012

Citing 7
Cited 0

Optimizing threaded MPI execution on SMP clusters

ICS '01 Proceedings of the 15th international conference on Supercomputing
Learning from the Success of MPI

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Capriccio: scalable threads for internet services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Toward Efficient Support for Multithreaded MPI Communication

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Scalability of communicators and groups in MPI

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Test suite for evaluating performance of MPI implementations that support MPI_THREAD_MULTIPLE

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fine-Grain MPI (FG-MPI) supports function-level parallelism while staying within the MPI process model. It provides a runtime that is directly integrated into the MPICH2 middleware and uses light-weight coroutines to implement an MPI-aware scheduler. Our key observation is that having multiple MPI processes per OS-process, with a runtime scheduler can be used to simplify MPI programming and achieve performance without adding complexity to the program. The performance part of the program is now outside of the specification of the program in the runtime where performance can be tuned with few, if any, changes to the code.