The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Symbolic state model: a new approach for the verification of cache coherence protocols
Symbolic state model: a new approach for the verification of cache coherence protocols
The Mercury Interconnect Architecture: a cost-effective infrastructure for high-performance servers
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
The Cache-Coherence Problem in Shared-Memory Multiprocessors: Hardware Solutions
The Cache-Coherence Problem in Shared-Memory Multiprocessors: Hardware Solutions
Spider: A High-Speed Network Interconnect
IEEE Micro
How Much Does Network Contention Affect Distributed Shared Memory Performance?
ICPP '97 Proceedings of the international Conference on Parallel Processing
Adaptive Source Routing in Multistage Interconnection Networks
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Using Formal Verification/Analysis Methods on the Critical Path in System Design: A Case Study
Proceedings of the 7th International Conference on Computer Aided Verification
The evolution of the HP/Convex Exemplar
COMPCON '97 Proceedings of the 42nd IEEE International Computer Conference
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Modern high-performance networks being used for scalable distributed shared-memory (DSM) systems support multiple paths to increase bandwidth and/or reduce contention. Such networks violate the constraint of pairwise in-order message delivery implicitly required by many existing directory-based cache coherence protocols. To solve this problem, two alternative strategies are currently used by computer architects. The first strategy, used in the SGI Origin series, is to employ an intelligent cache coherence protocol which detects and resolves all race conditions caused by out-of-order (OoO) events. The second strategy, used in the HAL Mercury series, is to use a sophisticated network interface (NI) which detects and remedies every OoO event before the messages are fed to the cache coherence controllers.Both strategies involve complicated hardware logic, either at the cache coherence controller level or at the NI level. In this paper, we propose a new strategy that uses block correlated FIFO channels. This new strategy detects all potential race conditions and prevents them from occurring. It allows the use of a simple cache coherence protocol and an inexpensive NI. We also present an efficient implementation of this strategy based on current technology. Detailed simulations are performed using benchmark applications to evaluate the performance of our new strategy. The results indicate that, compared to the existing strategies, our new strategy always provides either the best or close to the best overall performance. This study also provides valuable insights into the design trade-offs in incorporating modern networks into DSM systems.