Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors

Authors:
E. P. Markatos;T. J. LeBlanc
Affiliations:
-;-
Venue:
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
Year:
1992

Citing 0
Cited 4

Using processor affinity in loop scheduling on shared-memory multiprocessors

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Self-scheduling on distributed-memory machines

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Runtime Empirical Selection of Loop Schedulers on Hyperthreaded SMPs

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Is the schedule clause really necessary in OpenMP?

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism, and thereby minimize program execution time, is to execute loop iterations in parallel on different processors. Traditional approaches to loop scheduling attempt to achieve the minimum completion time by distributing the workload as evenly as possible, while minimizing the number of synchronization operations required. In this paper we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to non-local data. We show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. We propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. We compare the performance of this new algorithm to other known algorithms using five representative applications on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, and a Sequent Symmetry, and show that the new algorithm offers substantial performance improvements, up to a factor of 3 in some cases. We conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds.