Computing in the RAIN: A Reliable Array of Independent Nodes
IEEE Transactions on Parallel and Distributed Systems
Computing in the RAIN: A Reliable Array of Independent Nodes
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Hi-index | 0.00 |
The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating highly reliable distributed systems by leveraging commercially available personal computers, workstations and interconnect technologies. In particular, the issue of reliable communication is addressed by introducing redundancy in the form of multiple network interfaces per compute node.When using compute nodes with multiple network connections the question of how to determine connectivity between nodes arises. We examine a connectivity protocol that guarantees that each side of a point-to-point connection sees the same history of activity over the communication channel. In other words, we maintain a consistent history of the state of the communication channel. At any give moment in time the histories as seen by each side are guaranteed to be identical to within some number of transitions. This bound on how much one side may lead or lag the other is the slack.Our main contributions are: (i) a simple, stable protocol for monitoring connectivity that maintains a consistent history with bounded slack, and (ii) proofs that this protocol exhibits correctness, bounded slack, and stability.