Software-extended coherent shared memory: performance and cost

Authors:
D. Chaiken;A. Agarwal
Affiliations:
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA;Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA
Venue:
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Year:
1994

Citing 23
Cited 20

Software-controlled caches in the VMP multiprocessor

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
An evaluation of directory schemes for cache coherence

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Analysis of cache invalidation patterns in multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Mul-T: a high-performance parallel Lisp

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Directory-Based Cache Coherence in Large-Scale Multiprocessors

Computer
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The Stanford Dash Multiprocessor

Computer
Cooperative shared memory: software and hardware for scalable multiprocessor

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Closing the window of vulnerability in multiphase memory transactions

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Integrating message-passing and shared-memory: early experience

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive cache coherency for detecting migratory shared data

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Mechanisms for cooperative shared memory

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Object distribution in Orca using Compile-Time and Run-Time techniques

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Anatomy of a message in the Alewife multiprocessor

ICS '93 Proceedings of the 7th international conference on Supercomputing
An empirical evaluation of two memory-efficient directory methods

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors

IEEE Micro
The DASH Prototype: Logic Overhead and Performance

IEEE Transactions on Parallel and Distributed Systems
Improving Memory Utilization in Cache Coherence Directories

IEEE Transactions on Parallel and Distributed Systems
SPLASH: Stanford parallel applications for shared-memory*

SPLASH: Stanford parallel applications for shared-memory*

A comprehensive bibliography of distributed shared memory

ACM SIGOPS Operating Systems Review
Memory system performance of UNIX on CC-NUMA multiprocessors

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hive: fault containment for shared-memory multiprocessors

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
CRL: high-performance all-software distributed shared memory

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Efficient strategies for software-only protocols in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
In-memory directories: eliminating the cost of directories in CC-NUMAs

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The MIT Alewife machine: architecture and performance

25 years of the international symposia on Computer architecture (selected papers)
An Efficient Tree Cache Coherence Protocol for Distributed Shared Memory Multiprocessors

IEEE Transactions on Computers
Tolerating late memory traps in ILP processors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols

IEEE Transactions on Computers
Distributed Shared Memory: Concepts and Systems

IEEE Parallel & Distributed Technology: Systems & Technology
Generalized Multiprocessor Scheduling and Applications to Matrix Computations

IEEE Transactions on Parallel and Distributed Systems
Hardware Versus Software Implementation of COMA

ICPP '97 Proceedings of the international Conference on Parallel Processing
Dag-Consistent Distributed Shared Memory

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Tolerating Late Memory Traps in Dynamically Scheduled Processors

IEEE Transactions on Computers
A comparative evaluation of hardware-only and software-only directory protocols in shared-memory multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
A comparative evaluation of hybrid distributed shared-memory systems

Journal of Systems Architecture: the EUROMICRO Journal
Exploiting locality: a flexible DSM approach

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper evaluates the tradeoffs involved in the design of the software-extended memory system of Alewife, a multiprocessor architecture that implements coherent shared memory through a combination of hardware and software mechanisms. For each block of memory, Alewife implements between zero and five coherence directory pointers in hardware and allows software to handle requests when the pointers are exhausted. The software includes a flexible coherence interface that facilitates protocol software implementation. This interface is indispensable for conducting experiments and has proven important for implementing enhancements to the basic system.Simulations of a number of applications running on a complete system (with up to 256 processors) demonstrate that the hybrid architecture with five pointers achieves between 71% and 100% of full-map directory performance at a constant cost per processing element. Our experience in designing the software protocol interfaces and experiments with a variety of system configurations lead to a detailed understanding of the interaction of the hardware and software components of the system. The results show that a small amount of shared memory hardware provides adequate performance: One-pointer systems reach between 42% and 100% of full-map performance on our parallel benchmarks. A software-only directory architecture with no hardware pointers has lower performance but minimal cost.