Software write detection for a distributed shared memory

Authors:
Matthew J. Zekauskas;Wayne A. Sawdon;Brian N. Bershad
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;Department of Computer Science and Engineering, University of Washington, Seattle, WA
Venue:
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Year:
1994

Citing 20
Cited 11

The duality of memory and communication in the implementation of a multiprocessor operating system

SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Fine-grained mobility in the Emerald system

ACM Transactions on Computer Systems (TOCS)
The Amber system: parallel programming on a network of multiprocessors

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
Virtual memory primitives for user programs

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Orca: A Language for Parallel Programming of Distributed Systems

IEEE Transactions on Software Engineering
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Efficient data breakpoints

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Evaluation of release consistent software distributed shared memory on emerging network technology

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Protection traps and alternatives for memory management of an object-oriented language

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Efficient software-based fault isolation

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Hardware and software support for efficient exception handling

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Adaptive software cache management for distributed shared memory architectures

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
A Unified Formalization of Four Shared-Memory Models

IEEE Transactions on Parallel and Distributed Systems

A comprehensive bibliography of distributed shared memory

ACM SIGOPS Operating Systems Review
Efficient shared memory with minimal hardware support

ACM SIGARCH Computer Architecture News
Dag-Consistent Distributed Shared Memory

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Write Detection in Home-Based Software DSMs

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
InterAct: Virtual Sharing for Interactive Client-Server Applications

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
An Integrated Synchronization and Consistency Protocol for the Implementation of a High-Level Parallel Programming Language

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Using memory-mapped network interfaces to improve the performance of distributed shared memory

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Comparison of Entry Consistency and Lazy Release Consistency Implementations

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Optimizing irregular shared-memory applications for clusters

Proceedings of the 22nd annual international conference on Supercomputing
COMIC: a coherent shared memory interface for cell be

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
vNUMA: a virtual shared-memory multiprocessor

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most software-based distributed shared memory (DSM) systems rely on the operating system's virtual memory interface to detect writes to shared data. Strategies based on virtual memory page protection create two problems for a DSM system. First, writes can have high overhead since they are detected with a page fault. As a result, a page must be written many times to amortize the cost of that fault. Second, the size of a virtual memory page is too big to serve as a unit of coherency, inducing false sharing. Mechanisms to handle false sharing can increase runtime overhead and may cause data to be unnecessarily communicated between processors. In this paper, we present a new method for write detection that solves these problems. Our method relies on the compiler and runtime system to detect writes to shared data without invoking the operating system. We measure and compare implementations of a distributed shared memory system using both strategies, virtual memory and compiler/runtime, running a range of applications on a small scale distributed memory multicomputer. We show that the new method has low average write latency and supports fine-grained sharing with low overhead. Further, we show that the dominant cost of write detection with either strategy is due to the mechanism used to handle fine-grain sharing.