Is the schedule clause really necessary in OpenMP?

  • Authors:
  • Eduard Ayguadé;Bob Blainey;Alejandro Duran;Jesús Labarta;Francisco Martínez;Xavier Martorell;Raúl Silvera

  • Affiliations:
  • CEPBA, IBM Research Institute, Departament d'Arquitectura de Computadors, Universitat Politécnica de Catalunya, Barcelona, Spain;IBM Toronto Lab, Markham, ON, Canada;CEPBA, IBM Research Institute, Departament d'Arquitectura de Computadors, Universitat Politécnica de Catalunya, Barcelona, Spain;CEPBA, IBM Research Institute, Departament d'Arquitectura de Computadors, Universitat Politécnica de Catalunya, Barcelona, Spain;CEPBA, IBM Research Institute, Departament d'Arquitectura de Computadors, Universitat Politécnica de Catalunya, Barcelona, Spain;CEPBA, IBM Research Institute, Departament d'Arquitectura de Computadors, Universitat Politécnica de Catalunya, Barcelona, Spain;IBM Toronto Lab, Markham, ON, Canada

  • Venue:
  • WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

Choosing the appropriate assignment of loop iterations to threads is one of the most important decisions that need to be taken when parallelizing Loops, the main source of parallelism in numerical applications. This is not an easy task, even for expert programmers, and it can potentially take a large amount of time. OpenMP offers the schedule clause, with a set of predefined iteration scheduling strategies, to specify how (and when) this assignment of iterations to threads is done. In some cases, the best schedule depends on architectural characteristics of the target architecture, data input, ... making the code less portable. Even worse, the best schedule can change along execution time depending on dynamic changes in the behavior of the loop or changes in the resources available in the system. Also, for certain types of imbalanced loops, the schedulers already proposed in the literature are not able to extract the maximum parallelism because they do not appropriately trade-off load balancing and data locality. This paper proposes a new scheduling strategy, that derives at run time the best scheduling policy for each parallel loop in the program, based on information gathered at runtime by the library itself.