High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
International Journal of High Performance Computing and Networking
Hi-index | 0.00 |
This paper addresses the use of two latency hiding techniques, prefetching and weak consistency, for large-scale shared memory multiprocessors with compiler-controlled cache coherence management and the interaction of latency hiding techniques and network bandwidth. The performance effect of latency hiding is evaluated and compared varying the network channel bandwidth. The interaction of reads, writes, and prefetches given a limited bandwidth is studied, and an approach to better network bandwidth utilization by limiting the number of outstanding requests in each node is investigated. Increasing network (channel) bandwidth helps both prefetching and non-prefetching systems, with the initial 2x increase in bandwidth giving the most improvement. The use of prefetching can deliver a much larger improvement than increasing network bandwidth for a 128 processor system for some benchmarks, even with the minimal bandwidth. Controlling bandwidth utilization is shown to be important when prefetch and write request rates are high.