Tight Bounds on Capacity Misses for 3D Stencil Codes

Authors:
Claudia Leopold
Affiliations:
-
Venue:
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Year:
2002

Citing 8
Cited 2

A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficient out-of-core algorithms for linear relaxation using blocking covers

Journal of Computer and System Sciences - Special issue: papers from the 32nd and 34th annual symposia on foundations of computer science, Oct. 2–4, 1991 and Nov. 3–5, 1993
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Tiling optimizations for 3D scientific computations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
On optimal temporal locality of stencil codes

Proceedings of the 2002 ACM symposium on Applied computing
Cache-Efficient Multigrid Algorithms

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
A Comparison of Compiler Tiling Algorithms

CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
An Analytical Evaluation of Tiling for Stencil Codes with Time Loop

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium

An Analytical Evaluation of Tiling for Stencil Codes with Time Loop

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Tight bounds for low dimensional star stencils in the external memory model

WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of linear relaxation codes strongly depends on an efficient usage of caches. This paper considers one time step of the Jacobi and Gau脽-Seidel kernels on a 3D array, and shows that tiling reduces the number of capacity misses to almost optimum. In particular, we prove that 驴(N3/(L驴C)) capacity misses are needed for array size N 脳 N 脳 N, cache size C, and line size L. If cold misses are taken into account, tiling is off the lower bound by a factor of about 1+5/驴LC. The exact value depends on tile size and data layout. We show analytically that rectangular tiles of shape (N-2) 脳 s 脳 (sL/2) outperform square tiles, for row-major storage order.