A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficient out-of-core algorithms for linear relaxation using blocking covers
Journal of Computer and System Sciences - Special issue: papers from the 32nd and 34th annual symposia on foundations of computer science, Oct. 2–4, 1991 and Nov. 3–5, 1993
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
On optimal temporal locality of stencil codes
Proceedings of the 2002 ACM symposium on Applied computing
Cache-Efficient Multigrid Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A Comparison of Compiler Tiling Algorithms
CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
An Analytical Evaluation of Tiling for Stencil Codes with Time Loop
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An Analytical Evaluation of Tiling for Stencil Codes with Time Loop
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Tight bounds for low dimensional star stencils in the external memory model
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Hi-index | 0.00 |
The performance of linear relaxation codes strongly depends on an efficient usage of caches. This paper considers one time step of the Jacobi and Gau脽-Seidel kernels on a 3D array, and shows that tiling reduces the number of capacity misses to almost optimum. In particular, we prove that 驴(N3/(L驴C)) capacity misses are needed for array size N 脳 N 脳 N, cache size C, and line size L. If cold misses are taken into account, tiling is off the lower bound by a factor of about 1+5/驴LC. The exact value depends on tile size and data layout. We show analytically that rectangular tiles of shape (N-2) 脳 s 脳 (sL/2) outperform square tiles, for row-major storage order.