Using dataflow analysis techniques to reduce ownership overhead in cache coherence protocols

Authors:
Jonas Skeppstedt;Per Stenström
Affiliations:
Chalmers Univ. of Technology, Go¨teborg, Sweden;Chalmers Univ. of Technology, Go¨teborg, Sweden
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
1996

Citing 18
Cited 0

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
A Survey of Cache Coherence Schemes for Multiprocessors

Computer
The priority-based coloring approach to register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
The Stanford Dash Multiprocessor

Computer
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Cache Invalidation Patterns in Shared-Memory Multiprocessors

IEEE Transactions on Computers
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Cooperative shared memory: software and hardware for scalable multiprocessor

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Adaptive cache coherency for detecting migratory shared data

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Simple compiler algorithms to reduce ownership overhead in cache coherence protocols

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Essential misses and data traffic in coherence protocols

Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A compiler algorithm that reduces read latency in ownership-based cache coherence protocols

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we explore the potential of classical dataflow analysis techniques in removing overhead in write-invalidate cache coherence protocols for shared-memory multiprocessors. We construct the compiler algorithms with varying degree of sophistication that detect loads followed by stores to the same address. Such loads are marked and constitute a hint to the cache to obtain an exclusive copy of the block so that the subsequent store does not introduce access penalties. The simplest of the three compiler algorithms analyzes the existence of load-store sequences within each basic blocks of code whereas the other two analyze load-store sequences across basic blocks at the intraprocedural level. The algorithms have been incorporated into an optimizing C compiler, and we have evaluated their efficiencies by compiling and executing seven parallel programs on a simulated multiprocessor. Our results show that the detection efficiency of the most aggressive algorithm is 96% or higher for four of the seven programs studied. We also compare the efficiency of these static algorithms with that of dynamic hardware-based algorithms that reduce ownership overhead. We find that the static analysis using classical dataflow analysis results in similar performance improvements as dynamic hardware-based approaches.