Memory Access Dependencies in Shared-Memory Multiprocessors
IEEE Transactions on Software Engineering
Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Architectural mechanisms for explicit communication in shared memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Techniques for reducing overheads of shared-memory multiprocessing
ICS '95 Proceedings of the 9th international conference on Supercomputing
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
A cost-comparison approach for adaptive distributed shared memory
ICS '96 Proceedings of the 10th international conference on Supercomputing
Verification techniques for cache coherence protocols
ACM Computing Surveys (CSUR)
The interaction of parallel programming constructs and coherence protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared-memory performance profiling
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Analytical Prediction of Performance for Cache Coherence Protocols
IEEE Transactions on Computers
Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs
International Journal of Parallel Programming
IEEE Transactions on Computers
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Reducing the Write Traffic for a Hybrid Cache Protocol
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Implicit transactional memory in kilo-instruction multiprocessors
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.01 |
We consider three simple extensions to directory-based cache coherence protocols in shared-memory multiprocessors. These extensions are aimed at reducing the penalties associated with memory accesses and include a hardware prefetching scheme, a migratory sharing optimization, and a competitive-update mechanism. Since they target different components of the read and write penalties, they can be combined effectively.Detailed architectural simulations using five benchmarks show substantial combined performance gains obtained at a modest additional hardware cost. Prefetching in combination with competitive-update is the best combination under release consistency in systems with sufficient network bandwidth. By contrast, prefetching plus the migratory sharing optimization is advantageous under sequential consistency and/or in systems with limited network bandwidth.