System-wide performance monitors and their application to the optimization of coherent memory accesses

  • Authors:
  • Jean-Francois Collard;Norm Jouppi;Sami Yehia

  • Affiliations:
  • Hewlett-Packard Laboratories, Palo Alto, CA;Hewlett-Packard Laboratories, Palo Alto, CA;ARM Ltd, Cambridge, UK

  • Venue:
  • Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Inspired by recent advances in microprocessor performance monitors, this paper shows how a shared-memory multiprocessor chipset and interconnect can be equipped with performance monitors that associate performance events with the PCs of the individual instructions causing these events. Such monitors greatly simplify performance debugging of shared-memory programs---for example, they make finding pairs of instructions in false sharing straightforward. These monitors also enable precise feedback-directed compiler optimizations and, as a second contribution, we show how they can guide the code generator to use the version of the load instruction that makes the best use of the coherence protocol. Experiments show up to almost 10% coherence traffic reduction on SPLASH2 applications.