Optimizing software cache-coherent cluster architectures
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
Recent developments in shared-memory multiprocessor systems advocate using off-the-shelf hardware to provide basic communication mechanisms and using software to implement cache coherence policies. The exposure of communication mechanisms to software opens many opportunities for enhancing application performance. In this paper we propose a set of communication primitives implemented on a communication co-processor that introduce a flavor of message passing and permit protocol optimization. To assess the overhead of the software implementation of the primitives and protocols, we compare a PRAM model, a hardware cache coherence scheme, a software scheme implementing only the basic cache coherence protocol, and an optimized software solution supporting the additional communication primitives and running with applications annotated with those primitives. With the parameters we chose for the communication processor, the overall memory system overhead of the basic software scheme is at least 50% higher than that of the hardware implementation. With the adequate insertion of the communication primitives, the optimized software solution has a performance comparable to that of the hardware scheme.