High-Availability Computer Systems
Computer
An empirical study of operating systems errors
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Networked Windows NT System Field Failure Data Analysis
PRDC '99 Proceedings of the 1999 Pacific Rim International Symposium on Dependable Computing
Characterization of the Impact of Faulty Drivers on the Robustness of the Linux Kernel
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Improving the reliability of commodity operating systems
Improving the reliability of commodity operating systems
Hi-index | 0.00 |
Device drivers were claimed to be the most error prone in kernel source. A lot of error tolerance or error prevention approaches have been developed or suggested after this claim. But after analyzing the event log and maintenance record of Dawning4000A for three month, we find that device driver errors are not the most crucial crash causes in this previous TOP10 supercomputer. We believe device driver errors need developing and debugging efforts, rather than tolerance. We also suggest drivers to achieve better tolerance to device errors, especially on storage device.