Restructuring computations for temporal data cache locality

  • Authors:
  • Venkata K. Pingali;Sally A. McKee;Wilson C. Hsieh;John B. Carter

  • Affiliations:
  • Information Sciences Institute, University of Southern California, Los Angeles, California;Electrical and Computer Engineering, Cornell University, Ithaca, New York;School of Computing, 50S Central Campus Drive, Room 3190, University of Utah, Salt Lake City, Utah;School of Computing, 50S Central Campus Drive, Room 3190, University of Utah, Salt Lake City, Utah

  • Venue:
  • International Journal of Parallel Programming
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data access costs contribute significantly to the execution time of applications with complex data structures. A the latency of memory accesses becomes high relative to processor cycle times, application performance is increasingly limited by memory performance. In some situations it is useful to trade increased computation costs for reduced memory costs. The contributions of this paper are three-fold: we provide a detailed analysis of the memory performance of seven memory-intensive benchmarks; we describe Computation Regrouping, a source-level approach to improving the performance of memory-bound applications by increasing temporal locality to eliminate cache and TLB misses; and, we demonstrate significant performance improvement by applying Computation Regrouping to our suite of seven benchmarks. Using Computation Regrouping, we observe a geometric mean speedup of 1.90, with individual speedups ranging from 1.26 to 3.03. Most of this improvement comes from eliminating memory tall time.