Reduced Overhead Logging for Rollback Recovery in Distributed Shared Memory

Authors:
G. Suri;B. Jannsens
Affiliations:
-;-
Venue:
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Year:
1995

Citing 12
Cited 19

Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing

Computer
Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
Recoverable Distributed Shared Virtual Memory

IEEE Transactions on Computers
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A checkpoint protocol for an entry consistent shared memory system

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Tolerating node failures in cache only memory architectures

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Publishing: a reliable broadcast communication mechanism

SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
SPLASH: Stanford parallel applications for shared-memory

SPLASH: Stanford parallel applications for shared-memory
Integrating coherency and recoverability in distributed systems

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers

A comprehensive bibliography of distributed shared memory

ACM SIGOPS Operating Systems Review
A Survey of Recoverable Distributed Shared Virtual Memory Systems

IEEE Transactions on Parallel and Distributed Systems
Checkpointing Distributed Shared Memory

The Journal of Supercomputing - Special issue: high performance distributed computing
Support for Software Interrupts in Log-Based Rollback-Recovery

IEEE Transactions on Computers
A lighweight causal logging scheme for recoverable distributed shared memory

SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
A Low Overhead Logging Scheme for Fast Recovery in Distributed Shared Memory Systems

The Journal of Supercomputing
Scalable fault-tolerant distributed shared memory

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Lazy Garbage Collection of Recovery State for Fault-Tolerant Distributed Shared Memory

IEEE Transactions on Parallel and Distributed Systems
Lazy Garbage Collection of Recovery State for Fault-Tolerant Distributed Shared Memory

IEEE Transactions on Parallel and Distributed Systems
Fault-Tolerant Parallel Applications Using Queues and Actions

ICPP '97 Proceedings of the international Conference on Parallel Processing
An efficient causal logging scheme for recoverable distributed shared memory systems

Parallel Computing
Supporting nondeterministic execution in fault-tolerant systems

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Portable transparent checkpointing for distributed shared memory

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Logging and Recovery in Adaptive Software Distributed Shared Memory Systems

SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
An Adaptive Checkpointing Protocol to Bound Recovery Time with Message Logging

SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Modeling and evaluating the time overhead induced by BER in COMA multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
An Efficient Logging Scheme for Lazy Release Consistent Distributed Shared Memory Systems

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Log-based rollback recovery without checkpoints of shared memory in software DSM

The Journal of Supercomputing
Lightweight logging and recovery for distributed shared memory over virtual interface architecture

ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: Rollback techniques that use message logging and deterministic replay can be used in parallel systems to recover a failed node without involving other nodes. Distributed shared memory (DSM) systems cannot directly apply message-passing logging techniques because they use inherently nondeterministic asynchronous communication. This paper presents new logging schemes that reduce the typically high overhead for logging in DSM. Our algorithm for sequentially consistent systems tracks rather than logs accesses to shared memory. In an extension of this method to lazy release consistency, the per-access overhead of tracking has been completely eliminated. Measurements with parallel applications show a significant reduction in failure-free overhead.