Improving Processor and Cache Locality in Fine-Grain Parallel Computations using Object-Affinity Scheduling and Continuation Passing

Authors:
Robert J. Fowler;Leonidas I. Kontothanassis
Affiliations:
-;-
Venue:
Improving Processor and Cache Locality in Fine-Grain Parallel Computations using Object-Affinity Scheduling and Continuation Passing
Year:
1992

Citing 0
Cited 7

Computation migration: enhancing locality for distributed-memory parallel systems

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data locality and load balancing in COOL

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication optimizations for parallel computing using data access information

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Dynamic computation migration in DSM systems

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Simulation of the 3 dimensional cascade flow with numerical wind tunnel (NWT)

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Supporting dynamic data structures with Olden

Compiler optimizations for scalable parallel systems
Reinventing scheduling for multicore systems

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

On recent high-performance multiprocessors, there is a potential conflict between the goals of achieving the full performance potential of the hardware and providing a parallel programming environment that makes effective use of programmer effort. On one hand, an explicit coarse-grain programming style may appear to be necessary, both to achieve good cache performance and to limit the amount of overhead due to context switching and synchronization. On the other hand, it may be more expedient to use more natural and finer-grain programming styles based on abstractions such as task heaps, light-weight threads, parallel loops, or object-oriented parallelism. Unfortunately, using these styles can cause a loss of performance due to poor locality and high overhead. We claim that the locality issue in fine-grain parallel programs can be addressed effectively by using object-affinity scheduling and that the overhead can be reduced substantially by representing tasks as templates that are managed using continuation-passing style mechanisms. We present supporting evidence for these claims in the form of experimental measurements of programs running on Mercury, an object-oriented system implemented on an SGI 4D/480 multiprocessor.