Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP

  • Authors:
  • Yong Yan;Xiaodong Zhang;Zhao Zhang

  • Affiliations:
  • Hewlett Packard Labs, Palo Alto, CA;College of William and Mary, Williamsburg, VA;College of William and Mary, Williamsburg, VA

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Exploiting cache locality of parallel programs at runtime is a complementary approach to a compiler optimization. This is particularly important for those applications with dynamic memory access patterns. We propose a memory-layout oriented technique to exploit cache locality of parallel loops at runtime on Symmetric Multiprocessor (SMP) systems. Guided by application-dependent and targeted architecture-dependent hints, our system, called Cacheminer, reorganizes and partitions a parallel loop using the memory-access space of its execution. Through effective runtime transformations, our system maximizes the data reuse in each partitioned data region assigned in a cache, and minimizes the data sharing among the partitioned data regions assigned to all caches. The executions of tasks in the partitions are scheduled in an adaptive and locality-preserved way to minimize the execution time of programs by trading off load balance and locality. We have implemented the Cacheminer runtime library on two commercial SMP servers and an SimOS simulated SMP. Our simulation and measurement results show that our runtime approach can achieve comparable performance with the compiler optimizations for programs with regular computation and memory-access patterns, whose load balance and cache locality can be well optimized by the tiling and other program transformations. However, our experimental results show that our approach is able to significantly improve the memory performance for the applications with irregular computation and dynamic memory access patterns. These types of programs are usually hard to optimize by static compiler optimizations.