The input/output complexity of sorting and related problems
Communications of the ACM
An isoperimetric inequality on the discrete torus
SIAM Journal on Discrete Mathematics
A partial k-arboretum of graphs with bounded treewidth
Theoretical Computer Science
Tight bounds on cache use for stencil operations on rectangular grids
Journal of the ACM (JACM)
Tight Bounds on Capacity Misses for 3D Stencil Codes
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
An Analytical Evaluation of Tiling for Stencil Codes with Time Loop
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Communication lower bounds for distributed-memory matrix multiplication
Journal of Parallel and Distributed Computing
Cache oblivious stencil computations
Proceedings of the 19th annual international conference on Supercomputing
The memory behavior of cache oblivious stencil computations
The Journal of Supercomputing
Fundamental parallel algorithms for private-cache chip multiprocessors
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Graph expansion and communication costs of fast matrix multiplication
Journal of the ACM (JACM)
Hi-index | 0.00 |
Stencil computations on low dimensional grids are kernels of many scientific applications including finite difference methods used to solve partial differential equations. On typical modern computer architectures such stencil computations are limited by the performance of the memory subsystem, namely by the bandwidth between main memory and the cache. This work considers the computation of star stencils, like the 5-point and 7-point stencil, in the external memory model. The analysis focuses on the constant of the leading term of the non-compulsory I/Os. Optimizing stencil computations is an active field of research, but so far, there has been a significant gap between the lower bounds and the performance of the algorithms. In two dimensions, matching constants for lower and upper bounds are provided closing a gap of 4. In three dimensions, the bounds match up to a factor of $\sqrt{2}$ improving the known results by a factor of 2$\sqrt{3}\sqrt{B}$, where B is the block (cache line) size of the external memory model. For higher dimensions n, the presented lower bounds improve the previously known by a factor between 4 and 6 leaving a gap of $\sqrt[n-1]{n!} \thickapprox{{n} \over{e}}$.