Combined performance gains of simple cache protocol extensions

Authors:
F. Dahlgren;M. Dubois;P. Stenström
Affiliations:
Department of Computer Engineering, Lund University, P.O. Box 118, S-221 00 LUND, Sweden;Department of Electrical Engineering-Systems, University of Southern California, Los Angeles, CA;Department of Computer Engineering, Lund University, P.O. Box 118, S-221 00 LUND, Sweden
Venue:
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Year:
1994

Citing 7
Cited 19

Memory Access Dependencies in Shared-Memory Multiprocessors

IEEE Transactions on Software Engineering
Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Adaptive cache coherency for detecting migratory shared data

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture

Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Architectural mechanisms for explicit communication in shared memory multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Techniques for reducing overheads of shared-memory multiprocessing

ICS '95 Proceedings of the 9th international conference on Supercomputing
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A cost-comparison approach for adaptive distributed shared memory

ICS '96 Proceedings of the 10th international conference on Supercomputing
Verification techniques for cache coherence protocols

ACM Computing Surveys (CSUR)
The interaction of parallel programming constructs and coherence protocols

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared-memory performance profiling

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Analytical Prediction of Performance for Cache Coherence Protocols

IEEE Transactions on Computers
Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs

International Journal of Parallel Programming
Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory Multiprocessors

IEEE Transactions on Computers
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols

IEEE Transactions on Computers
RPM: A Rapid Prototyping Engine for Multiprocessor Systems

Computer
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Reducing the Write Traffic for a Hybrid Cache Protocol

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Implicit transactional memory in kilo-instruction multiprocessors

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

We consider three simple extensions to directory-based cache coherence protocols in shared-memory multiprocessors. These extensions are aimed at reducing the penalties associated with memory accesses and include a hardware prefetching scheme, a migratory sharing optimization, and a competitive-update mechanism. Since they target different components of the read and write penalties, they can be combined effectively.Detailed architectural simulations using five benchmarks show substantial combined performance gains obtained at a modest additional hardware cost. Prefetching in combination with competitive-update is the best combination under release consistency in systems with sufficient network bandwidth. By contrast, prefetching plus the migratory sharing optimization is advantageous under sequential consistency and/or in systems with limited network bandwidth.