Computation migration: enhancing locality for distributed-memory parallel systems
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data locality and load balancing in COOL
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication optimizations for parallel computing using data access information
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Dynamic computation migration in DSM systems
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Simulation of the 3 dimensional cascade flow with numerical wind tunnel (NWT)
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Supporting dynamic data structures with Olden
Compiler optimizations for scalable parallel systems
Reinventing scheduling for multicore systems
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Hi-index | 0.01 |
On recent high-performance multiprocessors, there is a potential conflict between the goals of achieving the full performance potential of the hardware and providing a parallel programming environment that makes effective use of programmer effort. On one hand, an explicit coarse-grain programming style may appear to be necessary, both to achieve good cache performance and to limit the amount of overhead due to context switching and synchronization. On the other hand, it may be more expedient to use more natural and finer-grain programming styles based on abstractions such as task heaps, light-weight threads, parallel loops, or object-oriented parallelism. Unfortunately, using these styles can cause a loss of performance due to poor locality and high overhead. We claim that the locality issue in fine-grain parallel programs can be addressed effectively by using object-affinity scheduling and that the overhead can be reduced substantially by representing tasks as templates that are managed using continuation-passing style mechanisms. We present supporting evidence for these claims in the form of experimental measurements of programs running on Mercury, an object-oriented system implemented on an SGI 4D/480 multiprocessor.