Advances in Computer Architecture
Advances in Computer Architecture
An exponential failure/load relationship: results of a multi-computer statistical study
An exponential failure/load relationship: results of a multi-computer statistical study
Performance considerations for the reliability analysis of computing systems.
Performance considerations for the reliability analysis of computing systems.
On Evaluating the Performability of Degradable Computing Systems
IEEE Transactions on Computers
A measurement-based model for workload dependence of CPU errors
IEEE Transactions on Computers - The MIT Press scientific computation series
Measurement-Based Analysis of Error Latency
IEEE Transactions on Computers
Analyze-NOW-an environment for collection and analysis of failures in a network of workstations
ISSRE '96 Proceedings of the The Seventh International Symposium on Software Reliability Engineering
A Study of Software Failures and Recovery in the MVS Operating System
IEEE Transactions on Computers
A model for space-correlated failures in large-scale distributed systems
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Hi-index | 14.99 |
In this correspondence we present a statistical model which relates mean computer failure rates to level of system activity. Our analysis reveals a strong statistical dependency of both hardware and software component failure rates on several common measures of utilization (specifically CPU utilization, I/O initiation, paging, and job-step initiation rates). We establish that this effect is not dominated by a specific component type, but exists across the board in the two systems studied. Our data covers three years of normal operation (including significant upgrades and reconfigurations) for two large Stanford University computer complexes. The complexes, which are composed of IBM mainframe equipment of differing models and vintage, run similar operating systems and provide the same interface and capability to their users. The empirical data comes from identically structured and maintained failure logs at the two sites along with IBM OS/VS2 operating system performance/load records.