Theory of linear and integer programming
Theory of linear and integer programming
Concrete mathematics: a foundation for computer science
Concrete mathematics: a foundation for computer science
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Computer organization and design (2nd ed.): the hardware/software interface
Computer organization and design (2nd ed.): the hardware/software interface
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Unfavorable Strides in Cache Memory Systems (RNR Technical Report RNR-92-015)
Scientific Programming
Cache oblivious parallelograms in iterative stencil computations
Proceedings of the 24th ACM International Conference on Supercomputing
Tight bounds for low dimensional star stencils in the external memory model
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Hi-index | 0.00 |
We derive tight bounds on cache misses for evaluation of explicit stencil operators on rectangular grids. Our lower bound is based on the isoperimetric property of the discrete crosspolytope. Our upper bound is based on a good surface-to-volume ratio of a parallelepiped spanned by a reduced basis of the interference lattice of a grid. Measurements show that our algorithm typically reduces the number of cache misses by a factor of three, relative to a compiler optimized code. We show that stencil calculations on grids whose interference lattices have a short vector feature abnormally high numbers of cache misses. We call such grids unfavorable and suggest to avoid these in computations by appropriate padding. By direct measurements on a MIPS R10000 processor we show a good correlation between abnormally high numbers of cache misses and unfavorable three-dimensional grids.