A Damage- and Fault-Tolerant Input/Output Network
IEEE Transactions on Computers
Pluribus: a reliable multiprocessor
AFIPS '75 Proceedings of the May 19-22, 1975, national computer conference and exposition
Hi-index | 0.00 |
Since the inception of the ARPA Network1 in 1969, we have been part of the group responsible for the development of that network's communications subnet. This role has provided us with a unique opportunity for study of the problems of network reliability and the effects of attempted improvements, particularly in the context of rapid network growth. Our overall philosophy for this effort has been that the network should be fault-tolerant with respect to individual component errors, and that the IMPs themselves should be fault-tolerant with respect to local failures. Along with this concern, we feel that the program should provide as much diagnostic information as possible. Component failures are of several kinds: hardware or software; solid, intermittent, or one-time. As we will discuss in the following sections, our attention has shifted in the last few years from handling circuit errors and failures to handling more difficult problems in the IMPs themselves: first intermittent problems, and recently even solid failures of major components.