Fingerprinting: bounding soft-error detection latency and bandwidth
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 1 - Volume 02
Techniques for Efficient Software Checking
Languages and Compilers for Parallel Computing
Mixed-mode multicore reliability
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Dynamic heterogeneity and the need for multicore virtualization
ACM SIGOPS Operating Systems Review
ESoftCheck: Removal of Non-vital Checks for Fault Tolerance
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A self-checking hardware journal for a fault-tolerant processor architecture
International Journal of Reconfigurable Computing - Special issue on selected papers from the international workshop on reconfigurable communication-centric systems on chips (ReCoSoC' 2010)
Evaluating the viability of process replication reliability for exascale systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
A Tandem T16 computer system is a network of up to 256 nodes. Each node consists of two to sixteen processors. The system had three major design goals: (1) Continuous data availability. (2) Modular growth by adding processing elements to a node. (3) Support of a network of geographically distributed nodes for on-line transaction processing. This talk will sketch Tandem's approach to continuous data availability. At the hardware level, NonStop is achieved by designing a single-fault tolerant system. A Tandem system has two or more modules and paths for each function. In addition, the system design addresses issues such as power failure, on-line maintenance, and reconfiguration.