Towards general and exact distributed invalidation

Authors:
M. F. P. O'Boyle;R. W. Ford;E. A. Stohr
Affiliations:
Institute for Computing Systems Architecture, School of Informatics, The University of Edinburgh, James Clerk Maxwell Building, King's Buildings, Mayfield Road, Edinburgh EH9 3JZ, UK;Centre for Novel Computing, Department of Computer Science, The University of Manchester, Manchester M13 9PL, UK;Centre for Novel Computing, Department of Computer Science, The University of Manchester, Manchester M13 9PL, UK
Venue:
Journal of Parallel and Distributed Computing
Year:
2003

Citing 26
Cited 0

Compiler-Directed Cache Management in Multiprocessors

Computer
Munin: distributed shared memory based on type-specific memory coherence

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Automatic software cache coherence through vectorization

ICS '92 Proceedings of the 6th international conference on Supercomputing
Life span strategy—a compiler-based approach to cache coherence

ICS '92 Proceedings of the 6th international conference on Supercomputing
Cooperative shared memory: software and hardware for scalable multiprocessor

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache coherence using local knowledge

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Simple compiler algorithms to reduce ownership overhead in cache coherence protocols

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The Omega Library interface guide

The Omega Library interface guide
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A graph based approach to barrier synchronisation minimisation

ICS '97 Proceedings of the 11th international conference on Supercomputing
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Array SSA form and its use in parallelization

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Evaluation of compiler-controlled updating to reduce coherence-miss penalties in shared-memory multiprocessors

Journal of Parallel and Distributed Computing
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
A compiler-directed cache coherence scheme with improved intertask locality

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters

SCI: Scalable Coherent Interface, Architecture and Software for High-Performance Compute Clusters
EDS: A Parallel Computer System for Advanced Information Processing

PARLE '92 Proceedings of the 4th International PARLE Conference on Parallel Architectures and Languages Europe
Compiler Reduction of Invalidation Traffic in Virtual Shared Memory Systems

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
An evaluation of DELTA, a decoupled pre-fetching virtual shared memory system

SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
Compiler analysis for cache coherence: Interprocedural array data-flow analysis and its impacts on cache performance

Compiler analysis for cache coherence: Interprocedural array data-flow analysis and its impacts on cache performance
A Compiler Algorithm to Reduce Invalidation Latency in Virtual Shared Memory Systems

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Identification and optimization of sharing patterns for scalable shared-memory multiprocessors

Identification and optimization of sharing patterns for scalable shared-memory multiprocessors

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper develops and proves an exact distributed invalidation algorithm for programs with general array accesses, arbitrary parallelisation and migratory writes. We present an efficient constructive algorithm that globally combines locally gathered information to insert coherence calls in such a manner to eliminate invalidation traffic without loss of locality and places the minimal number of coherence calls. Experimental results across a range of benchmarks show that it outperforms hardware based sequential and release consistency approaches and decreases application execution time by up to 12%. This is due to eliminating over 99% of the invalidation traffic in all benchmarks. This dramatic reduction in invalidation traffic reduces the total amount of network traffic by up to 28% and the number of network words transmitted by up to 19%.