Hive: fault containment for shared-memory multiprocessors
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
NFS illustrated
Increasing relevance of memory hardware errors: a case for recoverable programming models
EW 9 Proceedings of the 9th workshop on ACM SIGOPS European workshop: beyond the PC: new challenges for the operating system
Detection of Defective Media in Disks
Proceedings of the IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
Implementing Remote procedure calls
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Commercial Fault Tolerance: A Tale of Two Systems
IEEE Transactions on Dependable and Secure Computing
Reliability and security of RAID storage systems and D2D archives using SATA disk drives
ACM Transactions on Storage (TOS)
An integrated experimental environment for distributed systems and networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Proceedings of the twentieth ACM symposium on Operating systems principles
Dependability Analysis of Virtual Memory Systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Exterminator: automatically correcting memory errors with high probability
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
End-to-end data integrity for file systems: a ZFS case study
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Hi-index | 0.00 |
Distributed file systems need to be robust in the face of failures. In this work, we study the failure handling and recovery mechanisms of a widely used distributed file system, Linux NFS. We study the behavior of NFS under corruption of important metadata through fault injection. We find that the NFS protocol behaves in unexpected ways in the presence of these corruptions. On some occasions, incorrect errors are communicated to the client application; inothers, the system hangs applications or crashes outright; in a few cases, success is falsely reported when an operation has failed. We use the results of our study to draw lessons for future designs and implementations of the NFS protocol.