User-controllable coherence for high performance shared memory multiprocessors

Authors:
Collin McCurdy;Charles Fischer
Affiliations:
University of Wisconsin-Madison, Madison, WI;University of Wisconsin-Madison, Madison, WI
Venue:
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2003

Citing 14
Cited 2

Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
LCM: memory system support for parallel language implementation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An evaluation of computing paradigms for N-body simulations on distributed memory architectures

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Is data distribution necessary in OpenMP?

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Extending OpenMP for NUMA machines

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Performance characteristics of the SPEC OMP2001 benchmarks

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Starfire: Extending the SMP Envelope

IEEE Micro
Large System Performance of SPEC OMP2001 Benchmarks

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Blue Gene: a vision for protein science using a petaflop supercomputer

IBM Systems Journal - Deep computing for the life sciences

A localizing directory coherence protocol

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
System-wide performance monitors and their application to the optimization of coherent memory accesses

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

In programming high performance applications, shared address-space platforms are preferable for fine-grained computation, while distributed address-space platforms are more suitable for coarse-grained computation. However, currently only distributed address-space systems scale beyond the low hundreds of processors. In this paper we introduce a hybrid architecture that allows users to trade off local memory usage for coherence communication, making possible larger-scale shared memory architectures. We introduce a programming model and examine possible implementations of hardware mechanisms, evaluating some of the trade-offs inherent in each. Preliminary experiments on an application with particularly fine-grained communication requirements indicate that effective placement of directives can reduce coherence communication by more than a factor of 10 for 64 processors.