Efficient shared memory with minimal hardware support

Authors:
Leonidas I. Kontothanassis;Michael L. Scott
Affiliations:
Department of Computer Science, University of Rochester, Rochester, NY;Department of Computer Science, University of Rochester, Rochester, NY
Venue:
ACM SIGARCH Computer Architecture News
Year:
1995

Citing 19
Cited 0

The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Hector: A Hierarchically Structured Shared-Memory Multiprocessor

Computer - Special issue on experimental research in computer architecture
Distributed Shared Memory: A Survey of Issues and Algorithms

Computer - Distributed computing systems: separate resources acting as one
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The Stanford Dash Multiprocessor

Computer
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
High performance software coherence for current and future architectures

Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
Lazy release consistency for hardware-coherent multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
An Evaluation of Multiprocessor Cache Coherence Based on Virtual Memory Support

Proceedings of the 8th International Symposium on Parallel Processing
Kernel Support for the Wisconsin Wind Tunnel

USENIX Microkernels and Other Kernel Architectures Symposium
Software cache coherence for large scale multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Distributed Shared Memory for New Generation Networks

Distributed Shared Memory for New Generation Networks
TreadMarks: distributed shared memory on standard workstations and operating systems

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Software write detection for a distributed shared memory

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Shared memory is widely regarded as a more intuitive model than message passing for the development of parallel programs. A shared memory model can be provided by hardware, software, or some combination of both. One of the most important problems to be solved in shared memory environments is that of cache coherence. Experience indicates, unsurprisingly, that hardware-coherent multiprocessors greatly outperform distributed shared-memory (DSM) emulations on message-passing hardware. Intermediate options, however, have received considerably less attention. We argue in this position paper that one such option---a multiprocessor or network that provides a global physical address space in which processors can make non-coherent accesses to remote memory without trapping into the kernel or interrupting remote processors---can provide most of the performance of hardware cache coherence at little more monetary or design cost than traditional DSM systems. To support this claim we have developed the Cashmere family of software coherence protocols for NCC-NUMA (Non-Cache-Coherent, Non-Uniform-Memory Access) systems, and have used execution-driven simulation to compare the performance of these protocols to that of full hardware coherence and distributed shared memory emulation. We have found that for a large class of applications the performance of NCC-NUMA multiprocessors rivals that of fully hardware-coherent designs, and significantly surpasses the performance realized on more traditional DSM systems.