Architectural mechanisms for explicit communication in shared memory multiprocessors

Authors:
Umakishore Ramachandran;Gautam Shah;Anand Sivasubramaniam;Aman Singla;Ivan Yanasak
Affiliations:
College of Computing, Georgia Institute of Technology, Atlanta, GA;College of Computing, Georgia Institute of Technology, Atlanta, GA;College of Computing, Georgia Institute of Technology, Atlanta, GA;College of Computing, Georgia Institute of Technology, Atlanta, GA;College of Computing, Georgia Institute of Technology, Atlanta, GA
Venue:
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Year:
1995

Citing 26
Cited 12

Multiprocessor cache synchronization: issues, innovations, evolution

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Memory access buffering in multiprocessors

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Firefly: a multiprocessor workstation

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A cache coherence scheme with fast selective invalidation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Architectural primitives for a scalable shared memory multiprocessor

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The Stanford Dash Multiprocessor

Computer
On the parallel implementation of Goldberg's maximum flow algorithm

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Integrating message-passing and shared-memory: early experience

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
An approach to scalability study of shared memory parallel systems

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A simulation-based scalability study of parallel systems

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Combined performance gains of simple cache protocol extensions

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Integration of message passing and shared memory in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance advantages of integrating block data transfer in cache-coherent multiprocessors

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance impact of flexibility in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Weak ordering—a new definition

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Synchronization with multiprocessor caches

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Application-specific protocols for user-level shared memory

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
SPLASH: Stanford parallel applications for shared-memory

SPLASH: Stanford parallel applications for shared-memory

A cost-comparison approach for adaptive distributed shared memory

ICS '96 Proceedings of the 10th international conference on Supercomputing
Temporal notions of synchronization and consistency in Beehive

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Execution-driven simulators for parallel systems design

Proceedings of the 29th conference on Winter simulation
Tapeworm: high-level abstractions of shared accesses

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements

IEEE Transactions on Parallel and Distributed Systems
A high-level abstraction of shared accesses

ACM Transactions on Computer Systems (TOCS)
Compiler-directed shared-memory communication for iterative parallel applications

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Optimizing software cache-coherent cluster architectures

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Support for High-Frequency Streaming in CMPs

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Extending CC-NUMA systems to support write update optimizations

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
MAximum Multicore POwer (MAMPO): an automatic multithreaded synthetic power virus generation framework for multicore systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of this work is to explore architectural mechanisms for supporting explicit communication in cache-coherent shared memory multiprocessors. The motivation stems from the observation that applications display wide diversity in terms of sharing characteristics and hence impose different communication requirements on the system. Explicit communication mechanisms would allow tailoring the coherence management under software control to match these differing needs and strive to provide a close approximation to a zero overhead machine from the application perspective. Toward achieving these goals, we first analyze the characteristics of sharing observed in certain specific applications. We then use these characteristics to synthesize explicit communication primitives. The proposed primitives allow selectively updating a set of processors, or requesting a stream of data ahead of its intended use. These primitives are essentially generalizations of prefetch and poststore, with the ability to specify the sharer set for poststore either statically or dynamically. The proposed primitives are to be used in conjunction with an underlying invalidation based protocol. Used in this manner, the resulting memory system can dynamically adapt itself to performing either invalidations or updates to match the communication needs. Through application driven performance study we show the utility of these mechanisms in being able to reduce and tolerate communication latencies.