Is the schedule clause really necessary in OpenMP?

Authors:
Eduard Ayguadé;Bob Blainey;Alejandro Duran;Jesús Labarta;Francisco Martínez;Xavier Martorell;Raúl Silvera
Affiliations:
CEPBA, IBM Research Institute, Departament d'Arquitectura de Computadors, Universitat Politécnica de Catalunya, Barcelona, Spain;IBM Toronto Lab, Markham, ON, Canada;CEPBA, IBM Research Institute, Departament d'Arquitectura de Computadors, Universitat Politécnica de Catalunya, Barcelona, Spain;CEPBA, IBM Research Institute, Departament d'Arquitectura de Computadors, Universitat Politécnica de Catalunya, Barcelona, Spain;CEPBA, IBM Research Institute, Departament d'Arquitectura de Computadors, Universitat Politécnica de Catalunya, Barcelona, Spain;CEPBA, IBM Research Institute, Departament d'Arquitectura de Computadors, Universitat Politécnica de Catalunya, Barcelona, Spain;IBM Toronto Lab, Markham, ON, Canada
Venue:
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Year:
2003

Citing 7
Cited 6

Factoring: a method for scheduling parallel loops

Communications of the ACM
A dynamic scheduling method for irregular parallel programs

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Affinity scheduling of unbalanced workloads

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers

IEEE Transactions on Parallel and Distributed Systems
Feedback Guided Dynamic Loop Scheduling: Algorithms and Experiments

Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors

Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
Self-Adjusting Scheduling: An On-Line Optimization Technique for Locality Management and Load Balancing

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02

Runtime Empirical Selection of Loop Schedulers on Hyperthreaded SMPs

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A dynamic scheduler for balancing HPC applications

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

Languages and Compilers for Parallel Computing
Performance instrumentation and compiler optimizations for MPI/OpenMP applications

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Runtime adjustment of parallel nested loops

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Automatic OpenMP loop scheduling: a combined compiler and runtime approach

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World

Quantified Score

Hi-index	0.01

Visualization

Abstract

Choosing the appropriate assignment of loop iterations to threads is one of the most important decisions that need to be taken when parallelizing Loops, the main source of parallelism in numerical applications. This is not an easy task, even for expert programmers, and it can potentially take a large amount of time. OpenMP offers the schedule clause, with a set of predefined iteration scheduling strategies, to specify how (and when) this assignment of iterations to threads is done. In some cases, the best schedule depends on architectural characteristics of the target architecture, data input, ... making the code less portable. Even worse, the best schedule can change along execution time depending on dynamic changes in the behavior of the loop or changes in the resources available in the system. Also, for certain types of imbalanced loops, the schedulers already proposed in the literature are not able to extract the maximum parallelism because they do not appropriately trade-off load balancing and data locality. This paper proposes a new scheduling strategy, that derives at run time the best scheduling policy for each parallel loop in the program, based on information gathered at runtime by the library itself.