The Thread-Based Protocol Engines for CC-NUMA Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Hi-index | 0.01 |
In distributed shared-memory (DSM) multiprocessors, a write operation requires multiple messages to invalidate the nodes, which share and cache the memory block to be written. The resultants write stall time is a performance hurdle to such systems. One approach to efficient invalidation is to use multicast messages to reach the sharing nodes. In this paper, we use application-driven simulation to evaluate two multicast-based invalidation schemes: dual-path [10] and pruning [11]. Based on our experimental settings, we found that multicasts improve invalidation traffic for four of the six evaluated real applications. The remaining two programs are computation intensive, and multicast-based invalidation is less effective. However, since they induce bursty communication, we found that multicasts help to relieve the network congestion during those periods. Dual-path performs a little better than pruning, because it is less sensitive to routing delay in the routers. We also found that cache size is an important design parameter for multicast-based invalidation. It is more effective for DSM multiprocessors with larger caches.