Cooperative shared memory: software and hardware for scalable multiprocessors

Authors:
Mark D. Hill;James R. Larus;Steven K. Reinhardt;David A. Wood
Affiliations:
-;-;-;-
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
1993

Citing 25
Cited 40

An evaluation of directory schemes for cache coherence

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache coherence scheme with fast selective invalidation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Analysis of cache invalidation patterns in multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Cache considerations for multiprocessor programmers

Communications of the ACM
Trap architectures for Lisp systems

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Paradigm: A Highly Scalable Shared-Memory Multicomputer Architecture

Computer - Special issue on cryptography
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparison of hardware and software cache coherence schemes

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The Stanford Dash Multiprocessor

Computer
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Cooperative shared memory: software and hardware for scalable multiprocessor

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Mechanisms for cooperative shared memory

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Implementing a cache consistency protocol

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
A virtual machine emulator for performance evaluation

Communications of the ACM
Verifying a Multiprocessor Cache Controller Using Random Test Generation

IEEE Design & Test
The Scalable Coherent Interface and Related Standards Projects

IEEE Micro
The DASH Prototype: Logic Overhead and Performance

IEEE Transactions on Parallel and Distributed Systems
An economical solution to the cache coherence problem

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
CACHE COHERENCE PROTOCOLS FOR LARGE-SCALE MULTIPROCESSORS

CACHE COHERENCE PROTOCOLS FOR LARGE-SCALE MULTIPROCESSORS

Compiling for shared-memory and message-passing computers

ACM Letters on Programming Languages and Systems (LOPLAS)
Cost/performance of a parallel computer simulator

PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Where is time spent in message-passing and shared-memory programs?

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A comprehensive bibliography of distributed shared memory

ACM SIGOPS Operating Systems Review
Efficient strategies for software-only protocols in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Effective distributed scheduling of parallel workloads

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Modeling cost/performance of a parallel computer simulator

ACM Transactions on Modeling and Computer Simulation (TOMACS)
High-performance sorting on networks of workstations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Retrospective: tempest and typhoon: user-level shared memory

25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory

25 years of the international symposia on Computer architecture (selected papers)
Performance of database workloads on shared-memory systems with out-of-order processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Tolerating late memory traps in ILP processors

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

ACM Transactions on Computer Systems (TOCS)
Application-specific protocols for user-level shared memory

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Distributed Shared Memory: Concepts and Systems

IEEE Parallel & Distributed Technology: Systems & Technology
Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Programming Effort vs. Performance with a Hybrid Programming Model for Distributed Memory Parallel Architectures

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Exploiting high-level coherence information to optimize distributed shared state

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Inferential queueing and speculative push for reducing critical communication latencies

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Using cache optimizing compiler for managing software cache on distributed shared memory system

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
Tolerating Late Memory Traps in Dynamically Scheduled Processors

IEEE Transactions on Computers
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
A comparative evaluation of hardware-only and software-only directory protocols in shared-memory multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
A localizing directory coherence protocol

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
System-wide performance monitors and their application to the optimization of coherent memory accesses

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared memory computing on clusters with symmetric multiprocessors and system area networks

ACM Transactions on Computer Systems (TOCS)
Inferential queueing and speculative push

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Integrating XML and CORBA to support collaborative writing using off-the-shelf editing software

Information Systems
Circulating shared-registers for multiprocessor systems

Journal of Systems Architecture: the EUROMICRO Journal
Rethink the sync

ACM Transactions on Computer Systems (TOCS)
Cohesion: a hybrid memory model for accelerators

Proceedings of the 37th annual international symposium on Computer architecture
WAYPOINT: scaling coherence to thousand-core architectures

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Exploiting locality: a flexible DSM approach

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Scaling up the preventive replication of autonomous databases in cluster systems

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Why on-chip cache coherence is here to stay

Communications of the ACM
A new perspective for efficient virtual-cache coherence

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.02

Visualization

Abstract

We believe the paucity of massively parallel, shared-memory machines follows from the lack of a shared-memory programming performance model that can inform programmers of the cost of operations (so they can avoid expensive ones) and can tell hardware designers which cases are common (so they can build simple hardware to optimize them). Cooperative shared memory, our approach to shared-memory design, addresses this problem.Our initial implementation of cooperative shared memory uses a simple programming model, called Check-In/Check-Out (CICO), in conjunction with even simpler hardware, called Dir1SW. In CICO, programs bracket uses of shared data with a check_in directive terminating the expected use of the data. A cooperative prefetch directive helps hide communication latency. Dir1SW is a minimal directory protocol that adds little complexity to message-passing hardware, but efficiently supports programs written within the CICO model.