Improving whole-program locality using intra-procedural and inter-procedural transformations

  • Authors:
  • Mahmut Taylan Kandemir

  • Affiliations:
  • Department of Computer Science and Engineering, The Pennsylvania State University, 111 IST Building, University Park, PA 16802-6106, USA

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Exploiting spatial and temporal locality is essential for obtaining high performance on modern computers. Writing programs that exhibit high locality of reference is difficult and error-prone. Compiler researchers have developed loop transformations that allow the conversion of programs to exploit locality. Recently, transformations that change the memory layouts of multi-dimensional arrays-called data transformations-have been proposed. Unfortunately, both data and loop transformations have some important drawbacks. In this work, we present an integrated framework that uses loop and data transformations in concert to exploit the benefits of both approaches while minimizing the impact of their disadvantages. Our approach works inter-procedurally on acyclic call graphs, uses profile data to eliminate layout conflicts, and is unique in its capability of resolving conflicting layout requirements of different references to the same array in the same nest and in different nests for regular array-based applications. The optimization technique presented in this paper has been implemented in a source-to-source translator. We evaluate its performance using standard benchmark suites and several math libraries (complete programs) with large input sizes. Our experimental results show that the proposed approach improves the performance of the applications optimized by using the current state-of-the-art techniques by 8.2% on the average. This reduction comes from three important characteristics of the technique, namely, resolving layout conflicts between references to the same array in a loop nest, determining a suitable order to propagate layout modifications across loop nests, and propagating layouts between different procedures in the program-all in a unified framework. The locality optimization technique presented in this paper tries to exploit locality in the innermost loop positions. This strategy, in most cases, generates dependence-free outer loops, which can be safely parallelized.