Pinpointing and Exploiting Opportunities for Enhancing Data Reuse

  • Authors:
  • Gabriel Marin;John Mellor-Crummey

  • Affiliations:
  • Department of Computer Science, Rice University, 6100 Main St., MS 132, Houston, TX 77005, mgabi@cs.rice.edu;Department of Computer Science, Rice University, 6100 Main St., MS 132, Houston, TX 77005, johnmc@cs.rice.edu

  • Venue:
  • ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The potential for improving the performance of data-intensive scientific programs by enhancing data reuse in cache is substantial because CPUs are significantly faster than memory. Traditional performance tools typically collect or simulate cache miss counts or rates and attribute them at the function level. While such information identifies program scopes that exhibit a large cache miss rate, it is often insufficient to diagnose the causes for poor data locality and to identify what program transformations would improve memory hierarchy utilization. This paper describes an approach that uses memory reuse distance to identify an application's most significant memory access patterns causing cache misses and provide insight into ways of improving data reuse. Unlike previous approaches, our tool combines (1) analysis and instrumentation of fully optimized binaries, (2) online analysisof reuse patterns, (3) fine-grain attribution of measurements and models to statements, loops and variables, and (4) static analysis of access patterns to quantify spatial reuse. We demonstrate the effectiveness of our approach for understanding reuse patterns in two scientific codes: one for simulating neutron transport and a second for simulating turbulent transport in burning plasmas. Our tools pinpointed opportunities for enhancing data reuse. Using this feedback as a guide, we transformed the codes, reducing their misses at various levels of the memory hierarchy by integer factors and reducing their execution time by as much as 60% and 33%, respectively.