A transparent runtime data distribution engine for OpenMP

  • Authors:
  • Dimitrios S. Nikolopoulos;Theodore S. Papatheodorou;Constantine D. Polychronopoulos;Jes\'{u}s Labarta;Eduard Ayguad\'{e}

  • Affiliations:
  • Computer and Systems Research Laboratory, University of Illinois at Urbana-Champaign, 1308 West Main Street, Urbana, IL 61801, USA. E-mail: dsn@csrd.uiuc.edu (Correspd.);Department of Computer Engineering and Informatics, University of Patras, GR26500, Patras, Greece. E-mail: tsp@hpclab.ceid.upatras.gr;Computer and Systems Research Laboratory, University of Illinois at Urbana-Champaign, 1308 West Main Street, Urbana, IL 61801, USA. E-mail: {dsn,cdp}@csrd.uiuc.edu;Department of Computer Architecture, Technical University of Catalonia, c/Jordi Girona 1-3, 08034, Barcelona, Spain. E-mail: {jesus,eduard}@ac.upc.es;Department of Computer Architecture, Technical University of Catalonia, c/Jordi Girona 1-3, 08034, Barcelona, Spain. E-mail: {jesus,eduard}@ac.upc.es

  • Venue:
  • Scientific Programming
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper makes two important contributions. First, the paper investigates the performance implications of data placement in OpenMP programs running on modern NUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of contemporary NUMA architectures, reasonably balanced page placement schemes, such as round-robin or random distribution, incur modest performance losses. Second, the paper presents a transparent, user-level page migration engine with an ability to gain back any performance loss that stems from suboptimal placement of pages in iterative OpenMP programs. The main body of the paper describes how our OpenMP runtime environment uses page migration for implementing implicit data distribution and redistribution schemes without programmer intervention. Our experimental results verify the effectiveness of the proposed framework and provide a proof of concept that it is not necessary to introduce data distribution directives in OpenMP and warrant the simplicity or the portability of the programming model.