Scaling non-regular shared-memory codes by reusing custom loop schedules

  • Authors:
  • Dimitrios S. Nikolopoulos;Ernest Artiaga;Eduard Ayguadé;Jesús Labarta

  • Affiliations:
  • Coord. Sci. Lab., Univ. of Ill. at Urbana-Champaign, Urbana, IL 61801, USA. dsn@csrd.uiuc.edu (Correspd. Dept. of Comp. Sci., The Coll. of William&Mary, Williamsburg, VA 23187-8795, USA. Tel.: +1 ...;Department d' Arquitectura de Computadors, Universitat Politecnica de Catalunya, c/Jordi Girona 1-3, Modul D6, Barcelona 08034, Spain. E-mail: {ernest, eduard, jesus}@ac.upc.es;Department d' Arquitectura de Computadors, Universitat Politecnica de Catalunya, c/Jordi Girona 1-3, Modul D6, Barcelona 08034, Spain. E-mail: {ernest, eduard, jesus}@ac.upc.es;Department d' Arquitectura de Computadors, Universitat Politecnica de Catalunya, c/Jordi Girona 1-3, Modul D6, Barcelona 08034, Spain. E-mail: {ernest, eduard, jesus}@ac.upc.es

  • Venue:
  • Scientific Programming - OpenMP
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we explore the idea of customizing and reusing loop schedules to improve the scalability of non-regular numerical codes in shared-memory architectures with non-uniform memory access latency. The main objective is to implicitly setup affinity links between threads and data, by devising loop schedules that achieve balanced work distribution within irregular data spaces and reusing them as much as possible along the execution of the program for better memory access locality. This transformation provides a great deal of flexibility in optimizing locality, without compromising the simplicity of the shared-memory programming paradigm. In particular, the programmer does not need to explicitly distribute data between processors. The paper presents practical examples from real applications and experiments showing the efficiency of the approach.