MGS: a multigrain shared memory system

Authors:
Donald Yeung;John Kubiatowicz;Anant Agarwal
Affiliations:
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA;Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA;Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA
Venue:
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Year:
1996

Citing 15
Cited 23

Multi-level shared caching techniques for scalability in VMP-M/C

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The Stanford Dash Multiprocessor

Computer
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The shared regions approach to software cache coherence on multiprocessors

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Anatomy of a message in the Alewife multiprocessor

ICS '93 Proceedings of the 7th international conference on Supercomputing
Performance evaluation of hybrid hardware and software distributed shared memory protocols

ICS '94 Proceedings of the 8th international conference on Supercomputing
Software versus hardware shared-memory implementation: a case study

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
CRL: high-performance all-software distributed shared memory

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The working set model for program behavior

Communications of the ACM
SPLASH: Stanford parallel applications for shared-memory*

SPLASH: Stanford parallel applications for shared-memory*
The Benefits of Clustering in Shared Address Space Multiprocessors: An Applications-Driven Investigation

The Benefits of Clustering in Shared Address Space Multiprocessors: An Applications-Driven Investigation

Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
pSNOW: a tool to evaluate architectural issues for NOW environments

ICS '97 Proceedings of the 11th international conference on Supercomputing
Design and performance of the Shasta distributed shared memory protocol

ICS '97 Proceedings of the 11th international conference on Supercomputing
VM-based shared memory on low-latency, remote-memory-access networks

Proceedings of the 24th annual international symposium on Computer architecture
Towards transparent and efficient software distributed shared memory

Proceedings of the sixteenth ACM symposium on Operating systems principles
Cashmere-2L: software coherent shared memory on a clustered remote-write network

Proceedings of the sixteenth ACM symposium on Operating systems principles
Tolerating late memory traps in ILP processors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Application scaling under shared virtual memory on a cluster of SMPs

ICS '99 Proceedings of the 13th international conference on Supercomputing
The scalability of multigrain systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Multigrain shared memory

ACM Transactions on Computer Systems (TOCS)
Strategies optimization and integration in DSM

ACM SIGOPS Operating Systems Review
The effects of communication parameters on end performance of shared virtual memory clusters

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Multi-protocol active messages on a cluster of SMP's

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Removing the overhead from software-based shared memory

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Transparent Adaptation of Sharing Granularity in MultiView-Based DSM Systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Using System Emulation to Model Next-Generation Shared Virtual Memory Clusters

Cluster Computing
Shared virtual memory clusters: bridging the cost-performance gap between SMPs and hardware DSM systems

Journal of Parallel and Distributed Computing
Tolerating Late Memory Traps in Dynamically Scheduled Processors

IEEE Transactions on Computers
CAS-DSM: a compiler assisted software distributed shared memory

International Journal of Parallel Programming
Shared memory computing on clusters with symmetric multiprocessors and system area networks

ACM Transactions on Computer Systems (TOCS)
Circulating shared-registers for multiprocessor systems

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel workstations, each comprising 10-100 processors, promise cost-effective general-purpose multiprocessing. This paper explores the coupling of such small- to medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems. We call these systems Distributed Scalable Shared-memory Multiprocessors (DSSMPs).This paper introduces the design of a shared memory system that uses multiple granularities of sharing, and presents an implementation on the Alewife multiprocessor, called MGS. Multigrain shared memory enables the collaboration of hardware and software shared memory, and is effective at exploiting a form of locality called multigrain locality. The system provides efficient support for fine-grain cache-line sharing, and resorts to coarse-grain page-level sharing only when locality is violated. A framework for characterizing application performance on DSSMPs is also introduced.Using MGS, an in-depth study of several shared memory applications is conducted to understand the behavior of DSSMPs. We find that unmodified shared memory applications can exploit multigrain sharing. Keeping the number of processors fixed, applications execute up to 85% faster when each DSSMP node is a multiprocessor as opposed to a uniprocessor. We also show that tightly-coupled multiprocessors hold a significant performance advantage over DSSMPs on unmodified applications. However, a best-effort implementation of a kernel from one of the applications allows a DSSMP to almost match the performance of a tightly-coupled multiprocessor.