Store Memory-Level Parallelism Optimizations for Commercial Applications
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
We evaluate the performance impact of two different write-buffer configurations (one word per buffer entry and one block per buffer entry) and two different write policies (write-through and write-back), when using the partial block invalidation coherence mechanism in a shared-memory multiprocessor. Using an execution-driven simulator, we find that the one word per entry buffer configuration with a write-back policy is preferred for small write-buffer sizes when both buffers have an equal number of data words, and when they have equal hardware cost. Furthermore, when partial block invalidation is supported, we find that a write-through policy is preferred over a write-back policy due to its simpler cache hit detection mechanism, its elimination of write-back transactions, and its competitive-performance when the write-buffer is relatively large.