A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficient out-of-core algorithms for linear relaxation using blocking covers
Journal of Computer and System Sciences - Special issue: papers from the 32nd and 34th annual symposia on foundations of computer science, Oct. 2–4, 1991 and Nov. 3–5, 1993
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An experimental evaluation of tiling and shackling for memory hierarchy management
ICS '99 Proceedings of the 13th international conference on Supercomputing
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
A Comparison of Compiler Tiling Algorithms
CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Editorial message: special track on the programming languages and object technologies
Proceedings of the 2002 ACM symposium on Applied computing
Tight Bounds on Capacity Misses for 3D Stencil Codes
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Hi-index | 0.00 |
Iterative solvers such as the Jacobi and Gauss-Seidel relaxation methods are important, but time-consuming building blocks of many scientific and engineering applications. The performance problems are largely due to cache misses, and can be reduced by tiling the codes. Whereas previous research has shown the usefulness of tiling by experimentally comparing the run times of tiled and original codes, it did not tackle the question as to whether further improvements are possible. In this paper, we give a negative answer, regarding the exploitation of temporal locality in one step of a 2-dimensional stencil code. We derive upper and lower bounds that match up to a factor of about 1 + 2/M, where M is the cache size. For the upper bounds, we investigate some modifications of tiling.