Cache-aware iteration space partitioning

  • Authors:
  • Arun Kejariwal;Alexandru Nicolau;Utpal Banerjee;Alexander V. Veidenbaum;Constantine D. Polychronopoulos

  • Affiliations:
  • UC Irvine, Irvine, USA;UC, Irvine, Irvine, USA;Intel, Santa Clara, USA;UC, Irvine, Irvine, USA;UIUC, Urbana Champaign, USA

  • Venue:
  • Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The need for high performance per watt has led to the development of multi-core systems such as the Intel Core 2 Duo processor and the Intel quad-core Kentsfield processor. Maximal exploitation of the hardware parallelism supported by such systems necessitates the development of concurrent software. This, in part, entails program parallelization and efficient mapping of the parallelized program onto the different cores. The latter affects the load balance between the different cores which in turn has a direct impact on performance. In light of the fact that parallel loops, such as a parallel DO loop in Fortran, account for a large percentage of the total execution time, we focus on the problem of how to efficiently partition the iteration space of (possibly) nested perfect/non-perfect parallel loops. In this regard, one of the key aspects is how to efficiently capture the cache behavior as the cache subsystem is often the main performance bottleneck in multi-core systems. In this paper, we present a novel profile-guided compiler technique for cache-aware partitioning of iteration spaces of parallel loops. We present a case study using a kernel from the industry-standard SPEC CPU benchmark suite.