Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Memory coherence in shared virtual memory systems
ACM Transactions on Computer Systems (TOCS)
Recoverable Distributed Shared Virtual Memory
IEEE Transactions on Computers
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A checkpoint protocol for an entry consistent shared memory system
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Tolerating node failures in cache only memory architectures
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Publishing: a reliable broadcast communication mechanism
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
Integrating coherency and recoverability in distributed systems
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
A Survey of Recoverable Distributed Shared Virtual Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Checkpointing Distributed Shared Memory
The Journal of Supercomputing - Special issue: high performance distributed computing
Support for Software Interrupts in Log-Based Rollback-Recovery
IEEE Transactions on Computers
A lighweight causal logging scheme for recoverable distributed shared memory
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
A Low Overhead Logging Scheme for Fast Recovery in Distributed Shared Memory Systems
The Journal of Supercomputing
Scalable fault-tolerant distributed shared memory
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Lazy Garbage Collection of Recovery State for Fault-Tolerant Distributed Shared Memory
IEEE Transactions on Parallel and Distributed Systems
Lazy Garbage Collection of Recovery State for Fault-Tolerant Distributed Shared Memory
IEEE Transactions on Parallel and Distributed Systems
Fault-Tolerant Parallel Applications Using Queues and Actions
ICPP '97 Proceedings of the international Conference on Parallel Processing
Supporting nondeterministic execution in fault-tolerant systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Portable transparent checkpointing for distributed shared memory
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Logging and Recovery in Adaptive Software Distributed Shared Memory Systems
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
An Adaptive Checkpointing Protocol to Bound Recovery Time with Message Logging
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Modeling and evaluating the time overhead induced by BER in COMA multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
An Efficient Logging Scheme for Lazy Release Consistent Distributed Shared Memory Systems
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Log-based rollback recovery without checkpoints of shared memory in software DSM
The Journal of Supercomputing
Lightweight logging and recovery for distributed shared memory over virtual interface architecture
ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Hi-index | 0.00 |
Abstract: Rollback techniques that use message logging and deterministic replay can be used in parallel systems to recover a failed node without involving other nodes. Distributed shared memory (DSM) systems cannot directly apply message-passing logging techniques because they use inherently nondeterministic asynchronous communication. This paper presents new logging schemes that reduce the typically high overhead for logging in DSM. Our algorithm for sequentially consistent systems tracks rather than logs accesses to shared memory. In an extension of this method to lazy release consistency, the per-access overhead of tracking has been completely eliminated. Measurements with parallel applications show a significant reduction in failure-free overhead.