Scaling MPI to short-memory MPPs such as BG/L

Authors:
M. Farreras;T. Cortes;J. Labarta;G. Almasi
Affiliations:
Universitat Politecnica de Catalunya(UPC), Barcelona, Spain;Universitat Politecnica de Catalunya(UPC), Barcelona, Spain;Universitat Politecnica de Catalunya(UPC), Barcelona, Spain;IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
Proceedings of the 20th annual international conference on Supercomputing
Year:
2006

Citing 12
Cited 3

The X-Kernel: An Architecture for Implementing Network Protocols

IEEE Transactions on Software Engineering
The NAS parallel benchmarks—summary and preliminary results

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Credit-based flow control for ATM networks: credit update protocol, adaptive credit allocation and statistical multiplexing

SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A Dynamic Periodicity Detector: Application to Speedup Computation

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Scalable Flow Control Algorithm for the Fast Messages Communication Library

CANPC '99 Proceedings of the Third International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Efficient Communication Using Message Prediction for Cluster Multiprocessors

CANPC '00 Proceedings of the 4th International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Exploring the Predictability of MPI Messages

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Incorporating Memory Management into User-Level Network Interfaces

Incorporating Memory Management into User-Level Network Interfaces

Evaluating Sparse Data Storage Techniques for MPI Groups and Communicators

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Transparent redundant computing with MPI

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
A low impact flow control implementation for offload communication interfaces

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scalability to large number of processes is one of the weaknesses of current MPI implementations. Standard implementations are able to scale to hundreds of nodes, but not beyond. The main problem in these implementations is that they assume some resources (for both data and control-data) will always be available to receive/process unexpected messages. As we will show, this is not always true, especially in short-memory machines like the BG/L that has 64K nodes but each node only has 512Mbytes of memory.The objective of this paper is to present one algorithm that improves the robustness of MPI implementations for short-memory MPPs, taking care of data and control-data reception, the system will scale up to any number of nodes. The proposed solution achieves this goal without any observable overhead when there are no memory problems. Furthermore, in the worst case, when memory resources are extremely scarce, the overhead will never double the execution time (and we should never forget that in this extreme situation, traditional MPI implementations would fail to execute).