Run-Time Reference Clustering for Cache Performance Optimization

  • Authors:
  • Wesley K. Kaplow;Boleslaw K. Szymanski;Peter Tannenbaum;Viktor K. Decyk

  • Affiliations:
  • -;-;-;-

  • Venue:
  • PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a method for improving the cache performance of irregular computations in which data are referenced through run-time defined indirection arrays. Such computations often arise in scientific problems. The presented method, called Run-Time Reference Clustering (RTRC), is a run-time analog of a compile-time blocking used for dense matrix problems. RTRC uses the data partitioning and re-mapping techniques that are a part of distributed memory multi-processor codes designed to minimize interprocessor communication. Re-mapping each set of local data decreases cache-misses the same way re-mapping the global data decreases off-processor references. We demonstrate the applicability and performance of the RTRC technique on several prevalent applications: Sparse Matrix-Vector Multiply, Particle-In-Cell, and CHARMM-like codes. Performance results on SPARC-20, SP-2, and T3-D processors show that single node execution performance can be improved by as much as 35%.