Design and validation of computer protocols
Design and validation of computer protocols
IEEE Transactions on Software Engineering - Special issue on formal methods in software practice
Harness: a next generation distributed virtual machine
Future Generation Computer Systems - Special issue on metacomputing
Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Scalable Fault-Tolerant Aggregation in Large Process Groups
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
A Scalable Process-Management Environment for Parallel Programs
Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A Gossip-Style Failure Detection Service
A Gossip-Style Failure Detection Service
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
A scalable content-addressable network
A scalable content-addressable network
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Self-healing in binomial graph networks
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems - Volume Part II
Modeling resubmission in unreliable grids: the bottom-up approach
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Binomial graph: a scalable and fault-tolerant logical network topology
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Fault tolerance logical network properties of irregular graphs
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Abstractions and Middleware for Petascale Computing and Beyond
International Journal of Distributed Systems and Technologies
Hi-index | 0.00 |
The number of processors embedded on high performance computing platforms is growing daily to satisfy users desire for solving larger and more complex problems. Parallel runtime environments have to support and adapt to the underlying libraries and hardware which require a high degree of scalability in dynamic environments. This paper presents the design of a scalable and fault tolerant protocol for supporting parallel runtime environment communications. The protocol is designed to support transmission of messages across multiple nodes with in a self-healing topology to protect against recursive node and process failures. A formal protocol verification has validated the protocol for both the normal and failure cases. We have implemented multiple routing algorithms for the protocol and concluded that the variant rule-based routing algorithm yields the best overall results for damaged and incomplete topologies .