Firefly: a multiprocessor workstation
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A cache coherence approach for large multiprocessor systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Cache write policies and performance
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Journal of Parallel and Distributed Computing
Dynamic decentralized cache schemes for mimd parallel processors
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A low-overhead coherence solution for multiprocessors with private cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Two techniques for improving performance on bus-based multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Prospects for On-Line Hybrid Coherency Protocols on Bus-Based Multiprocessors
The Prospects for On-Line Hybrid Coherency Protocols on Bus-Based Multiprocessors
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Analytical Prediction of Performance for Cache Coherence Protocols
IEEE Transactions on Computers
Ace: a language for parallel programming with customizable protocols
ACM Transactions on Computer Systems (TOCS)
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor
ICS '01 Proceedings of the 15th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Two Adaptive Hybrid Cache Coherency Protocols
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Bus-based COMA-reducing traffic in shared-bus multiprocessors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Proceedings of the 30th annual international symposium on Computer architecture
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
An adaptive cache coherence protocol for chip multiprocessors
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Service based communication for MPSoC platform-SegBus
Microprocessors & Microsystems
Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors
International Journal of Parallel Programming
Hi-index | 0.00 |
Previous studies of bus-based shared-memory multiprocessors have shown hybrid write-invalidate/write-update snooping protocols to be incapable of providing consistent performance improvements over write-invalidate protocols. In this paper, we analyze the deficiencies of hybrid snooping protocols under release consistency, and show how these deficiencies can be dramatically reduced by using write caches and read snarfing.Our performance evaluation is based on program-driven simulation and a set of five scientific applications with different sharing behaviors including migratory sharing as well us producer-consumer sharing. We show that a hybrid protocol, extended with write caches as well as read snarfing, manages to reduce the number of coherence misses by between 83% and 95% as compared to a write-invalidate protocol for all five applications in this study. In addition, the number of bus transactions is reduced by between 36% and 60% for four of the applications and by 9% for the fifth application. Because of the small implementation cost of the hybrid protocol and the two extensions, we believe that this combination is an effective approach to boost the performance of bus-based multiprocessors.