The trade-off between implicit and explicit data distribution in shared-memory programming paradigms

  • Authors:
  • Dimitrios S. Nikolopoulos;Eduard Ayguadé;Theodore S. Papatheodorou;Constantine D. Polychronopoulos;Jesús Labarta

  • Affiliations:
  • Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, 1308 West Main Street, Urbana, IL;Department d' Arquirectura, de Computadors, Universitat Politecnica de Catalunya, c/Jordi Girona 1-3 08034, Barcelona, Spain;Department of Computer, Engineering and Informatics, University of Patras, Rion, 26500, Patras, Greece;Coordinated Science Laboratory, University of Illinois, at Urbana-Champaign, 1308 West Main Street, Urbana, IL;Department d' Arquirectura, de Computadors, Universitat Politecnica de Catalunya, c/Jordi Girona 1-3 08034, Barcelona, Spain

  • Venue:
  • ICS '01 Proceedings of the 15th international conference on Supercomputing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper explores previously established and novel methods for scaling the performance of OpenMP on NUMA architectures. The spectrum of methods under investigation includes OS-level automatic page placement algorithms, dynamic page migrationd manual data distribution. The trade-off that these methods face lies between performance and programming effort. Automatic page placement algorithms are transparent to the programmer, but may compromise memory access locality. Dynamic page migration is also transparent, but requires careful engineering of online algorithms to be effective. Manual data distribution on the other requires substantial programming effort and architecture-specific extensions to OpenMP, but may localize memory accesses in a nearly optimal manner.The main contributions of the paper are: a classification of application characteristics, which identifies clearly the conditions under which transparent methods are both capable and sufficient for optimizing memory locality in an OpenMP program; and the use of two novel runtime techniques, runtime data distribution based on memory access traces and affinity scheduling with iteration schedule reuse, as competitive substitutes of manual data distribution in several important classes of applications.