Compiler optimization to improve data locality for processor multithreading
Scientific Programming
Partitioning and scheduling loops on NOWs
Computer Communications
Hi-index | 0.00 |
Presents a method for determining the cache performance of the loop nests in a program. The cache-miss data are produced by simulating the loop nest execution on an architecturally parameterized cache simulator. We show that the cache-miss rates are highly non-linear with respect to the ranges of the loops, and correlate well with the performance of the loop nests on actual target machines. The cache-miss ratio is used to guide program optimizations such as loop interchange and iteration-space blocking. It can also be used to provide an estimate for the runtime of a program. Both applications are important in scheduling programs for parallel execution. We present examples of program optimization for several popular processors, such as the IBM 9076 SP1, the SuperSPARC and the Intel i860.