Feasibility study of MPI implementation on the heterogeneous multi-core cell BE™ architecture
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Thread-safety in an MPI implementation: Requirements and analysis
Parallel Computing
Virtual machine aware communication libraries for high performance computing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A Buffered-Mode MPI Implementation for the Cell BETM Processor
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
A Prototype Implementation of MPI for SMARTMAP
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Exploiting Direct Access Shared Memory for MPI On Multi-Core Processors
International Journal of High Performance Computing Applications
Optimizing a parallel runtime system for multicore clusters: a case study
Proceedings of the 2010 TeraGrid Conference
Scalability of communicators and groups in MPI
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A GPGPU transparent virtualization component for high performance computing clouds
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Implementing MPI on windows: comparison with common approaches on Unix
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Locality and topology aware intra-node communication among multicore CPUs
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
A uGNI-based MPICH2 nemesis network module for the cray XE
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Performance evaluation of thread-based MPI in shared memory
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Implementation and shared-memory evaluation of MPICH2 over the nemesis communication subsystem
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Improving communication latency with the write-only architecture
Journal of Parallel and Distributed Computing
A synchronous mode MPI implementation on the cell BETM architecture
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
An integrated runtime scheduler for MPI
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Using DVFS to optimize time warp simulations
Proceedings of the Winter Simulation Conference
Redesigning MPI shared memory communication for large multi-core architecture
Computer Science - Research and Development
On the performance of concurrent transfers in collective algorithms
Proceedings of the 20th European MPI Users' Group Meeting
Proposing a new task model towards many-core architecture
Proceedings of the First International Workshop on Many-core Embedded Systems
Evaluation of messaging middleware for high-performance cloud computing
Personal and Ubiquitous Computing
Hi-index | 0.00 |
This paper presents a new low-level communication subsystem called Nemesis. Nemesis has been designed and implemented to be scalable and efficient both in the intranode communication context using shared-memory and in the internode communication case using high-performance networks and is natively multimethod-enabled. Nemesis has been integrated in MPICH2 as a CH3 channel and delivers better performance than other dedicated communication channels in MPICH2. Furthermore, the resulting MPICH2 architecture outperforms other MPI implementations in point-to-point benchmarks.