Simple compiler algorithms to reduce ownership overhead in cache coherence protocols

Authors:
Jonas Skeppstedt;Per Stenström
Affiliations:
Department of Computer Engineering, Lund University, P.O. Box 118, S-221 00 Lund, SWEDEN;Department of Computer Engineering, Lund University, P.O. Box 118, S-221 00 Lund, SWEDEN
Venue:
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Year:
1994

Citing 16
Cited 10

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
The priority-based coloring approach to register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
The Stanford Dash Multiprocessor

Computer
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Cache Invalidation Patterns in Shared-Memory Multiprocessors

IEEE Transactions on Computers
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Cooperative shared memory: software and hardware for scalable multiprocessor

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The detection and elimination of useless misses in multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Adaptive cache coherency for detecting migratory shared data

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Portable Programs for Parallel Processors

Portable Programs for Parallel Processors
A New Solution to Coherence Problems in Multicache Systems

IEEE Transactions on Computers

A comprehensive bibliography of distributed shared memory

ACM SIGOPS Operating Systems Review
Lazy release consistency for hardware-coherent multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A compiler algorithm that reduces read latency in ownership-based cache coherence protocols

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Using dataflow analysis techniques to reduce ownership overhead in cache coherence protocols

ACM Transactions on Programming Languages and Systems (TOPLAS)
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
PSCR: A Coherence Protocol for Eliminating Passive Sharing in Shared-Bus Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Achieving High Performance in Bus-Based Shared-Memory Multiprocessors

IEEE Concurrency
Exact Distributed Invalidation

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Bus-based COMA-reducing traffic in shared-bus multiprocessors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Towards general and exact distributed invalidation

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study in this paper the design and efficiency of compiler algorithms that remove ownership overhead in shared-memory multiprocessors with write-invalidate protocols. These algorithms detect loads followed by stores to the same address. Such loads are marked and constitute a hint to the cache to obtain an exclusive copy of the block. We consider three algorithms where the first one focuses on load-store sequences within each basic block of code and the other two analyse the existence of load-store sequences across basic blocks at the intra-procedural level. Since the dataflow analysis we adopt is a trivial variation of live-variable analysis, the algorithms are easily incorporated into a compiler.Through detailed simulations of a cache-coherent NUMA architecture using five scientific parallel benchmark programs, we find that the algorithms are capable of removing over 95% of the separate ownership acquisitions. Moreover, we also find that even the simplest algorithm is comparable in efficiency with previously proposed hardware-based adaptive cache coherence protocols to attack the same problem.