Supporting Fault-Tolerant Parallel Programming in Linda
IEEE Transactions on Parallel and Distributed Systems
HARNESS and fault tolerant MPI
Parallel Computing - Clusters and computational grids for scientific computing
Peer-to-Peer: Harnessing the Power of Disruptive Technologies
Peer-to-Peer: Harnessing the Power of Disruptive Technologies
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
LSSC '01 Proceedings of the Third International Conference on Large-Scale Scientific Computing-Revised Papers
Managing Checkpoints for Parallel Programs
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
XtremWeb: A Generic Global Computing System
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
A Study of Encrypted, Tunneling Models in Virtual Private Networks
ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
MPICH-V: toward a scalable fault tolerant MPI for volatile nodes
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A channel memory based fault tolerance for MPI applications
Future Generation Computer Systems - Special issue: Parallel computing technologies
A Channel Memory based fault tolerance for MPI applications
Future Generation Computer Systems - Special issue: Parallel computing technologies
Hi-index | 0.00 |
The paper presents MPICH-CM - a new architecture of communications in message-passing systems, developed for MPICH-V - a MPI implementation for P2P systems. MPICH-CM implies communications between nodes through special Channel Memories introducing fully decoupled communication media. Some new properties of communications based on MPICH-CM are described in comparison with other communication architectures, with emphasis on grid-like and volunteer computing systems. The first implementation of MPICH-CM is performed as a special MPICH device connected with Channel Memory servers. To estimate the overhead of MPICH-CM, the performance of MPICH-CM is presented for basic point-to-point and collective operations in comparison with MPICH p4 implementation.