Integration, the VLSI Journal
Determining the idle time of a tiling
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles
IEEE Transactions on Parallel and Distributed Systems
Optimal Task Scheduling to Minimize Inter-Tile Latencies
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
Precise Tiling for Uniform Loop Nests
ASAP '95 Proceedings of the IEEE International Conference on Application Specific Array Processors
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Hi-index | 0.00 |
In this paper, we present an efficient and simple solution to the parallelization of discrete integration programs of ordinary differential equations (ODE). The main technique used is known as loop tiling. To avoid the overhead due to code complexity and border effects, we introduce redundant tasks and we use non parallelepiped tiles. Thanks both to cache reuse (脳4:3) and coarse granularity (脳24:5), the speedup using 25 processors over the non-tiled sequential implementation is larger than 106.We also present the draft of a fuzzy methodology to optimize the tile size and we illustrate it using real measurements for the communication cost and the execution time. In particular, we observe that the model of communication latencies over a Myrinet network is not as simple as is usually reported.