A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems
Communications of the ACM
RISC microprocessors and scientific computing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Upper bounds to processor-time tradeoffs under bounded-speed message propagation
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
An analysis of dag-consistent distributed shared-memory algorithms
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Matrix computations (3rd ed.)
Locality of Reference in LU Decomposition with Partial Pivoting
SIAM Journal on Matrix Analysis and Applications
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Cache-oblivious priority queue and graph algorithm applications
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Space-limited procedures: a methodology for portable high-performance
PMMP '95 Proceedings of the conference on Programming Models for Massively Parallel Computers
Scientific Computations on Modern Parallel Vector Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Mambo: a full system simulator for the PowerPC architecture
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Cache oblivious stencil computations
Proceedings of the 19th annual international conference on Supercomputing
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Effective automatic parallelization of stencil computations
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Improving parallelism and locality with asynchronous algorithms
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
State-of-the-art in heterogeneous computing
Scientific Programming
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Direct Numerical Simulation of Particulate Flows on 294912 Processor Cores
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Landing stencil code on Godson-T
Journal of Computer Science and Technology
Tight bounds for low dimensional star stencils in the external memory model
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Hi-index | 0.00 |
We present and evaluate a cache oblivious algorithm for stencil computations, which arise for example in finite-difference methods. Our algorithm applies to arbitrary stencils in n-dimensional spaces. On an "ideal cache" of size Z, our algorithm saves a factor of 驴(Z 1/n ) cache misses compared to a naive algorithm, and it exploits temporal locality optimally throughout the entire memory hierarchy. We evaluate our algorithm in terms of the number of cache misses, and demonstrate that the memory behavior agrees with our theoretical predictions. Our experimental evaluation is based on a finite-difference solution of a heat diffusion problem, as well as a Gauss-Seidel iteration and a 2-dimensional LBMHD program, both reformulated as cache oblivious stencil computations.