Munin: distributed shared memory based on type-specific memory coherence
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
ACM SIGOPS Operating Systems Review
Lightweight logging for lazy release consistent distributed shared memory
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Quantifying the performance differences between PVM and TreadMarks
Journal of Parallel and Distributed Computing
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Scalable fault-tolerant distributed shared memory
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
JIAJIA: A Software DSM System Based on a New Cache Coherence Protocol
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Software DSM Protocols that Adapt between Single Writer and Multiple Writer
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Efficiently Adapting to Sharing Patterns in Software DSMs
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Logging and Recovery in Adaptive Software Distributed Shared Memory Systems
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Coherence-Based Coordinated Checkpointing for Software Distributed Shared Memory Systems
ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Home-based shared virtual memory
Home-based shared virtual memory
Fault-Tolerant Distributed Shared Memory on a Broadcast-Based Architecture
IEEE Transactions on Parallel and Distributed Systems
Locks and barriers in checkpointing and recovery
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
Libckpt: transparent checkpointing under Unix
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Hi-index | 0.00 |
Distributed shared memory (DSM) creates an abstraction of a physical shared memory that parallel programmers can access. Most recent software DSM systems provide relaxed-memory models that guarantee consistency only at synchronization operations, such as locks and barriers. As the main goal of DSM systems is to provide support for long-term computation-intensive applications, checkpointing and recovery mechanisms are highly desirable. This article presents and evaluates the integration of a coordinated checkpointing mechanism to the barrier primitive that is usually provided with many DSM systems. Our results on some popular benchmarks and a real parallel application show that the overhead introduced during the failure-free execution is often small.