ACM SIGOPS Operating Systems Review
Scope consistency: a bridge between release consistency and entry consistency
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Quantifying the performance differences between PVM and TreadMarks
Journal of Parallel and Distributed Computing
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Scalable fault-tolerant distributed shared memory
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
JIAJIA: A Software DSM System Based on a New Cache Coherence Protocol
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Software DSM Protocols that Adapt between Single Writer and Multiple Writer
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Efficiently Adapting to Sharing Patterns in Software DSMs
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Logging and Recovery in Adaptive Software Distributed Shared Memory Systems
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Coherence-Based Coordinated Checkpointing for Software Distributed Shared Memory Systems
ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Home-based shared virtual memory
Home-based shared virtual memory
Locks and barriers in checkpointing and recovery
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
TreadMarks: distributed shared memory on standard workstations and operating systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Libckpt: transparent checkpointing under Unix
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Hi-index | 0.00 |
Distributed Shared Memory (DSM) creates an abstraction of a physical shared memory that parallel programmers can access. Most recent software DSMs provide relaxed memory models that guarantee consistency only at synchronization operations. As the main goal of DSM systems is to provide support for long term computation intensive applications, checkpointing and recovery mechanisms are highly desirable. This article presents and evaluates the integration of a coordinated checkpointing mechanism to the barrier primitive that is usually provided with many DSM systems. Our results on some popular benchmarks and a real parallel application show that the overhead introduced during the failure-free execution is often small.