Using memory-mapped network interfaces to improve the performance of distributed shared memory

Authors:
L. I. Kontothanassis;M. L. Scott
Affiliations:
-;-
Venue:
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Year:
1996

Citing 27
Cited 13

The design and analysis of parallel algorithms

The design and analysis of parallel algorithms
Compiler-Directed Cache Management in Multiprocessors

Computer
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Distributed Shared Memory: A Survey of Issues and Algorithms

Computer - Distributed computing systems: separate resources acting as one
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The Stanford Dash Multiprocessor

Computer
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cache write policies and performance

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Evaluating stream buffers as a secondary cache replacement

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Separating data and control transfer in distributed operating systems

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
High performance software coherence for current and future architectures

Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
Lazy release consistency for hardware-coherent multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Memory Channel Network for PCI

IEEE Micro
An Evaluation of Multiprocessor Cache Coherence Based on Virtual Memory Support

Proceedings of the 8th International Symposium on Parallel Processing
Using simple page placement policies to reduce the cost of cache fills in coherent shared-memory systems

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors

MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Software cache coherence for large scale multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Improving Release-Consistent Shared Virtual Memory using Automatic Update

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
TreadMarks: distributed shared memory on standard workstations and operating systems

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Software write detection for a distributed shared memory

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation

Understanding application performance on shared virtual memory systems

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Scope consistency: a bridge between release consistency and entry consistency

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
VM-based shared memory on low-latency, remote-memory-access networks

Proceedings of the 24th annual international symposium on Computer architecture
Monitoring shared virtual memory performance on a Myrinet-based PC cluster

ICS '98 Proceedings of the 12th international conference on Supercomputing
Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Accelerating shared virtual memory via general-purpose network interface support

ACM Transactions on Computer Systems (TOCS)
Runtime optimizations for a Java DSM implementation

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Source-level global optimizations for fine-grain distributed shared memory systems

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Removing the overhead from software-based shared memory

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Shared Virtual Memory Clusters with Next-Generation Interconnection Networks and Wide Compute Nodes

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters

Cluster Computing
Shared virtual memory clusters: bridging the cost-performance gap between SMPs and hardware DSM systems

Journal of Parallel and Distributed Computing
Shared memory computing on clusters with symmetric multiprocessors and system area networks

ACM Transactions on Computer Systems (TOCS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Shared memory is widely believed to provide an easier programming model than message passing for expressing parallel algorithms. Distributed Shared Memory (DSM) systems provide the illusion of shared memory on top of standard message passing hardware at very low implementation cost, but provide acceptable performance for only a limited class of applications. We argue that the principal sources of overhead overhead in DSM systems can be dramatically reduced with modest amounts of hardware support (substantially less than is required for hardware cache coherence). Specifically, we present and evaluate a family of protocols designed to exploit hardware support for a global, but non-coherent, physical address space. We consider systems both with and without remote cache fills, fine-grain access faults, "doubled" writes to local and remote memory, and merging write buffers. We also consider varying levels of latency and bandwidth. We evaluate our protocols using execution driven simulation, comparing them to each other and to a state-of-the-art protocol for traditional message-based networks. For the programs in our application suite, protocols taking advantage of the global address space improve performance by a minimum of 50% and sometimes by as much as an order of magnitude.