Removing the overhead from software-based shared memory

Authors:
Zoran Radović;Erik Hagersten
Affiliations:
Uppsala University, Sweden;Uppsala University, Sweden
Venue:
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Year:
2001

Citing 28
Cited 8

Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
EEL: machine-independent executable editing

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Software caching and computation migration in Olden

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRL: high-performance all-software distributed shared memory

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
MGS: a multigrain shared memory system

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Coherent network interfaces for fine-grain communication

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Lazy release consistency for distributed shared memory

Lazy release consistency for distributed shared memory
Design and performance of the Shasta distributed shared memory protocol

ICS '97 Proceedings of the 11th international conference on Supercomputing
VM-based shared memory on low-latency, remote-memory-access networks

Proceedings of the 24th annual international symposium on Computer architecture
Towards transparent and efficient software distributed shared memory

Proceedings of the sixteenth ACM symposium on Operating systems principles
Cashmere-2L: software coherent shared memory on a clustered remote-write network

Proceedings of the sixteenth ACM symposium on Operating systems principles
Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Performance experiences on Sun's Wildfire prototype

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
The effects of communication parameters on end performance of shared virtual memory clusters

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Shared Virtual Memory Clusters with Next-Generation Interconnection Networks and Wide Compute Nodes

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Using memory-mapped network interfaces to improve the performance of distributed shared memory

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Comparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Sirocco: Cost-Effective Fine-Grain Distributed Shared Memory

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Brazos: a third generation DSM system

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

SIP: Performance Tuning through Source Code Interdependence

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A comparative evaluation of hardware-only and software-only directory protocols in shared-memory multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Towards a more efficient implementation of OpenMP for clusters via translation to global arrays

Parallel Computing - OpenMp
TMA: a trap-based memory architecture

Proceedings of the 20th annual international conference on Supercomputing
Orchestrating data transfer for the cell/B.E. processor

Proceedings of the 22nd annual international conference on Supercomputing
DBDB: optimizing DMATransfer for the cell be architecture

Proceedings of the 23rd international conference on Supercomputing
Exploiting locality: a flexible DSM approach

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The implementation presented in this paper---DSZOOM-WF---is a sequentially consistent, fine-grained distributed software-based shared memory. It demonstrates a protocol-handling overhead below a microsecond for all the actions involved in a remote load operation, to be compared to the fastest implementation to date of around ten microseconds.The all-software protocol is implemented assuming some basic low-level primitives in the cluster interconnect and an operating system bypass functionality, similar to the emerging InfiniBand standard. All interrupt- and/or poll-based asynchronous protocol processing is completely removed by running the entire coherence protocol in the requesting processor. This not only removes the asynchronous overhead, but also makes use of a processor that otherwise would stall. The technique is applicable to both page-based and fine-grain software-based shared memory.DSZOOM-WF consistently demonstrates performance comparable to hardware-based distributed shared memory implementations.