Fault detection in an Ethernet network via anomaly detectors
Fault detection in an Ethernet network via anomaly detectors
Portable and Fault-Tolerant Software Systems
IEEE Micro
Geographically Distributed System for Catastrophic Recovery
LISA '02 Proceedings of the 16th USENIX conference on System administration
Probabilistic resource allocation in heterogeneous distributed systems with random failures
Journal of Parallel and Distributed Computing
Hi-index | 4.10 |
This paper contains an analysis of client/server outage data and presents a list of outage causes extracted from the data. The outage causes include hardware, software, operations, and environmental failures, as well as outages due to planned reconfigurations. The study spans all client, server, and network devices in a typical client/server environment. The paper illustrates how to use the data to predict client/server availability and evaluate potential availability improvements. The results are stated in terms of annual user outage minutes and have been validated by comparison with other outage surveys and data in the literature. The major results from the outage data study are: