Performance impact of resource conflicts on chip multi-processor servers
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Hi-index | 0.00 |
Iterative stencil loops are used in scientific programs to implement relaxation methods for numerical simulation and signal processing. Such loops iteratively modifies the same array elements over different time steps. Hence, exploitation of temporal data locality can lead to significantly improved cache performance. This paper shows that, to optimally tile iterative stencil loops, the imperfectly-nested inner loops must be realigned such that they can be minimally skewed across different time steps. A memory-reference cost analysis proves that the number of cache misses is minimized when the skewing is minimum. A graph-theoretical algorithm, which takes polynomial time, is presented to determine the minimum skew factors for a given nesting of iterative stencil loops.