Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
The priority-based coloring approach to register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
The Stanford Dash Multiprocessor
Computer
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Cooperative shared memory: software and hardware for scalable multiprocessor
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Simple compiler algorithms to reduce ownership overhead in cache coherence protocols
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Essential misses and data traffic in coherence protocols
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A compiler algorithm that reduces read latency in ownership-based cache coherence protocols
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Hi-index | 0.00 |
In this article, we explore the potential of classical dataflow analysis techniques in removing overhead in write-invalidate cache coherence protocols for shared-memory multiprocessors. We construct the compiler algorithms with varying degree of sophistication that detect loads followed by stores to the same address. Such loads are marked and constitute a hint to the cache to obtain an exclusive copy of the block so that the subsequent store does not introduce access penalties. The simplest of the three compiler algorithms analyzes the existence of load-store sequences within each basic blocks of code whereas the other two analyze load-store sequences across basic blocks at the intraprocedural level. The algorithms have been incorporated into an optimizing C compiler, and we have evaluated their efficiencies by compiling and executing seven parallel programs on a simulated multiprocessor. Our results show that the detection efficiency of the most aggressive algorithm is 96% or higher for four of the seven programs studied. We also compare the efficiency of these static algorithms with that of dynamic hardware-based algorithms that reduce ownership overhead. We find that the static analysis using classical dataflow analysis results in similar performance improvements as dynamic hardware-based approaches.