Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Locality Enhancement for Large-Scale Shared-Memory Multiprocessors
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Locality optimizations for jacobi iteration on distributed parallel systems
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
Abstract: Tiling exploits temporal reuse carried by an outer loop of a loop nest to enhance cache locality. Loop skewing is typically required to make tiling legal. This restricts parallelism to wavefronts in the tiled iteration space. For a small number of processors, wavefront parallelism can be efficiently exploited using dynamic self-scheduling with a large tile size. Such a strategy enhances intratile locality, but does not necessarily enhance intertile locality. We show that dynamic self-scheduling performs poorly on scalable shared-memory multiprocessors where smaller tiles are necessary to provide sufficient parallelism-smaller tiles place greater importance on intertile locality. We propose static scheduling strategies which enhance intertile locality for small tiles. Results of experiments on a Convex SPP1000 multiprocessor demonstrate that our strategies outperform dynamic self-scheduling by a factor of up to 2.3 on 30 processors.