Locality Analysis for Parallel C Programs

  • Authors:
  • Yingchun Zhu;Laurie J. Hendren

  • Affiliations:
  • McGill Univ., Montreal, P.Q., Canada;McGill Univ., Montreal, P.Q., Canada

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many parallel architectures support a memory model where some memory accesses are local and, thus, inexpensive, while other memory accesses are remote and potentially quite expensive. In the case of memory references via pointers, it is often difficult to determine if the memory reference is guaranteed to be local and, thus, can be handled via an inexpensive memory operation. Determining which memory accesses are local can be done by the programmer, the compiler, or a combination of both. The overall goal is to minimize the work required by the programmer and have the compiler automate the process as much as possible. This paper reports on compiler techniques for determining when indirect memory references are local. The locality analysis has been implemented for a parallel dialect of C called EARTH-C, and it uses an algorithm inspired by type inference algorithms for fast points-to analysis. The algorithm statically estimates when an indirect reference via a pointer can be safely assumed to be a local access. The locality inference algorithm is also used to guide the automatic specialization of functions in order to take advantage of locality specific to particular calling contexts. In addition to these purely static techniques, we also suggest fine-grain and coarse-grain dynamic techniques. In this case, dynamic locality checks are inserted into the program and specialized code for the local case is inserted. In the fine-grain case, the checks are put around single memory references, while in the coarse-grain case the checks are put around larger program segments. The static locality analysis and automatic specialization has been implemented in the EARTH-C compiler, which produces low-level threaded code for the EARTH multithreaded architecture. Experimental results are presented for a set of benchmarks that operate on irregular, dynamically allocated data structures. Overall, the techniques give moderate to significant speedups, with the combination of static and dynamic techniques giving the best performance overall.