A performance evaluation of optimal hybrid cache coherency protocols
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
False sharing and its effect on shared memory performance
Sedms'93 USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4
Hi-index | 0.00 |
There are three major classes of MIMD multiprocessors: cache-coherent machines, NUMA (non-uniform memory reference) machines without cache coherence, and distributed-memory multicomputers. All three classes can be used to run shared-memory applications, though the third requires software support in order to do so, and the second requires software support in order to do so well. We use trace-driven simulation to compare the performance of these classes, in an attempt to determine the effect of various architectural features and parameters on overall program performance. For those systems whose hardware or software supports both coherent caching (migration, replication) and remote reference, we use optimal off-line analysis to make the correct decision in all cases. This technique allows us to evaluate architectural alternatives without worrying that the results may be biased by a poor data placement policy. We find that the size of the unit of coherence (page or cache line) is the dominant factor in performance; that NUMA systems can have performance comparable to that of cache coherent machines; and that even relatively expensive, software-implemented remote reference is beneficial in distributed shared memory machines.