Segregated failures model for availability evaluation of fault-tolerant systems
ACSC '06 Proceedings of the 29th Australasian Computer Science Conference - Volume 48
Model based approach for autonomic availability management
ISAS'06 Proceedings of the Third international conference on Service Availability
Hi-index | 0.00 |
The Reliable Clustered Computing project created a system which enables applications to improve the reliability of off the shelf computers from a typical 99% (about 90 hours of downtime per year) to 99.99% (under one hour of downtime per year) in a cost-effective manner. The chief constrants were the need to achieve high reliability while minimizing cost and maintaining vendor independence. This was realized by creating a vendor independent clustered configuration comprised of two or more computers capable of recovering from hardware or software errors by restarting one or more processes on the current machine or by failing over one or more processes to another machine. Only two inexpensive custom hardware components were required for this solution: a WatchDog, to monitor component status, and a PowerDog, to control electrical power to processing elements (and optional peripherals). The bulk of the functionality was provided by software.