Analysis of cache-coherence bottlenecks with hybrid hardware/software techniques

  • Authors:
  • Jaydeep Marathe;Frank Mueller;Bronis R. de Supinski

  • Affiliations:
  • North Carolina State University, Raleigh, NC;North Carolina State University, Raleigh, NC;Lawrence Livermore National Laboratory, Livermore, CA

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Application performance on high-performance shared-memory systems is often limited by sharing patterns resulting in cache-coherence bottlenecks. Current approaches to identify coherence bottlenecks incur considerable run-time overhead and do not scale. We present two novel hardware-assisted coherence-analysis techniques that reduce trace sizes by two orders of magnitude over full traces. First, hardware performance monitoring is combined with capturing stores in software to provide a lossy-trace mechanism, which is an order of magnitude faster than software-instrumentation-based full-tracing and retains accuracy. Second, selected long-latency loads are instrumented via binary rewriting, which provides even higher accuracy and control over tracing, but requires additional overhead.