A hierarchical locality algorithm for NUMA compilation

  • Authors:
  • M. O'Boyle

  • Affiliations:
  • -

  • Venue:
  • PDP '95 Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

A compiler algorithm which exploits program locality and reduces the latency overhead in parallel hierarchical memory machines is described. By applying the appropriate transformation at different levels of the hierarchy, the amount of nonlocal accesses between processors is minimised. Similarly, the memory structure within a processor is exploited so reducing the amount of communication between local main memory and private cache. This algorithm is based on a compound sequence of transformations that goes beyond unimodular transformations described in previous Work. This algorithm can exploit locality in complex array accesses and general iteration spaces. Furthermore, by use of strip mining and a novel use of data alignment, excessive storage for temporaries can be prevented.