Everlab: a production platform for research in network experimentation and computation
LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
Nested parallelism for multi-core HPC systems using Java
Journal of Parallel and Distributed Computing
Multicore-enabling the MPJ express messaging library
Proceedings of the 8th International Conference on the Principles and Practice of Programming in Java
Parallel Approach for Ensemble Learning with Locally Coupled Neural Networks
Neural Processing Letters
HPC environment management: new challenges in the petaflop era
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
A comparison of three MPI implementations for red storm
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Light-Weight parallel i/o analysis at scale
EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
Towards cross-platform cloud computing
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Comparison of VM deployment methods for HPC education
Proceedings of the 1st Annual conference on Research in information technology
Fast and efficient total exchange on two clusters
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
TopCluster: A hybrid cluster model to support dynamic deployment in Grid
Journal of Computer and System Sciences
Hi-index | 0.00 |
This talk will describe MPICH2, an all-new implementation of MPI designed to support both MPI-1 and MPI-2 and to enable further research into MPI implementation technology. To achieve high-performance and scalability and to encourage experimentation, the design of MPICH2 is strongly modular. For example, the MPI topology routines can easily be replaced by implementations tuned to a specific environment, such as a geographically dispersed computational grid. The interface to the communication layers has been designed to exploit modern interconnects that are capable of remote memory access but can be implemented on older networks. An initial, TCP-based implementation, will be described in detail, illustrating the use of a simple, communication-channel interface. A multi-method device that provides TCP, VIA, and shared memory communication will also be discussed. Performance results for point-to-point and collective communication will be presented. These illustrate the advantages of the new design: the point-to-point TCP performance is close to the raw achievable latency and bandwidth, and the collective routines are significantly faster than the "classic" MPICH versions (more than a factor of two in some cases). Performance issues that arise in supporting MPI_THREAD_MULTIPLE will be discussed, and the role of a proper choice of implementation abstraction in achieving low-overhead will be illustrated with results from the MPICH2 implementation.Scalability to tens or hundreds of thousands of processors is another goal of the MPICH2 design. This talk will describe some of the features of MPICH2 that address scalability issues and current research targeting a system with 64K processing elements.