Improving cluster availability using workstation validation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
ACM Transactions on Computer Systems (TOCS)
More Than an Interface---SCSI vs. ATA
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Experiences in measuring the reliability of a cache-based storage system
WIESS'00 Proceedings of the 1st conference on Industrial Experiences with Systems Software - Volume 1
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
ACM Transactions on Storage (TOS)
More than an interface: scsi vs. ata
FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
Scalable testing of file system checkers
Proceedings of the 7th ACM european conference on Computer Systems
Hi-index | 0.00 |
This paper analyzes the error behavior of a 3.2TB disk storage system. We report reliability data for 18 months of the prototype''s operation, and analyze 6 months of error logs from nodes in the prototype. We found that the disks drives were among the most reliable components in the system. We were also able to divide errors into eleven categories, comprising disk errors, network errors and SCSI errors that appeared repeatedly across all nodes. We also gained insight into the types of error messages reported by devices in various conditions, and the effects of these events on the operating system. We also present data from four cases of disk drive failures. These results and insights should be useful to any designer of a fault tolerant storage system.