The design of the UNIX operating system
The design of the UNIX operating system
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Preserving and using context information in interprocess communication
ACM Transactions on Computer Systems (TOCS)
Recovery in distributed systems using optimistic message logging and check-pointing
Journal of Algorithms
Understanding fault-tolerant distributed systems
Communications of the ACM
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
IEEE Transactions on Computers - Special issue on fault-tolerant computing
On the relevance of communication costs of rollback-recovery protocols
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Efficient Algorithms for Crash Recovery in Distributed Systems
Proceedings of the Tenth Conference on Foundations of Software Technology and Theoretical Computer Science
Distributed Systems - Architecture and Implementation, An Advanced Course
The Timed Asynchronous Distributed System Model
IEEE Transactions on Parallel and Distributed Systems
The Timewheel Group Communication System
IEEE Transactions on Computers
Hi-index | 0.00 |
This paper describes the design, implementation, and performance of a stable-storage service that has been implemented on top of the Unix operating system. This service allows servers to create, access, and delete persistent memory that survives server crashes. We describe its functionality and exported operations, discuss the experiences and performance of its implementation, and offer concrete examples of its use in implementing some real fault-tolerant distributed protocols.