An evaluation of a compiler optimization for improving the performance of a coherence directory

  • Authors:
  • Farnaz Mounes-Toussi;David J. Lilja;Zhiyuan Li

  • Affiliations:
  • Department of Electrical Engineering, 200 Union Street S. E., University of Minnesota, Minneapolis, MN;Department of Electrical Engineering, 200 Union Street S. E., University of Minnesota, Minneapolis, MN;Department of Computer science, 200 Union Street S. E., University of Minnesota, Minneapolis, MN

  • Venue:
  • ICS '94 Proceedings of the 8th international conference on Supercomputing
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

Both hardware-controlled and compiler-directed mechanisms have been proposed for maintaining cache coherence in large-scale shared-memory multiprocessors, but both of these approaches have significant limitations. We examine the potential performance improvement of a new software-hardware controlled cache coherence mechanism. This approach augments the run-time information available to a directory-based coherence mechanism with compile-time analysis that statically identifies write references that cannot cause coherence problems and writes that should be written through to memory. These references are marked as not needing to send invalidation messages to thereby reduce the network traffic produced by the directory while maintaining cache consistency. For those memory references that are ambiguous, due to conditional branches, or due to the need for complex data flow analysis, for instance, the compiler conservatively marks the references and relies on the hardware directory to ensure coherence. Trace-driven simulations are used to emulate the compile-time analysis on memory traces and to estimate potential performance improvement that could be expected from a compiler performing this optimization on the Perfect Club benchmark programs. By reducing the number of invalidations, this optimized directory scheme is capable of reducing the processor-memory network traffic by up to 54 percent compared to an unoptimized directory mechanism. In addition, the overall miss ratio can be reduced up to 42 percent due to a corresponding reduction in the number of write misses.