Upper bounds to processor-time tradeoffs under bounded-speed message propagation
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Cache-oblivious dynamic programming
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
The cache complexity of multithreaded cache oblivious algorithms
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The memory behavior of cache oblivious stencil computations
The Journal of Supercomputing
International Journal of Computational Science and Engineering
Smashing: Folding Space to Tile through Time
Languages and Compilers for Parallel Computing
Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Proceedings of the 23rd international conference on Supercomputing
Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms
Journal of Parallel and Distributed Computing
A Multilevel Parallelization Framework for High-Order Stencil Computations
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Cache oblivious parallelograms in iterative stencil computations
Proceedings of the 24th ACM International Conference on Supercomputing
The Cilkview scalability analyzer
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Cache-oblivious polygon indecomposability testing
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Cache-Oblivious Dynamic Programming for Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Introducing the semi-stencil algorithm
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Three layer cake for shared-memory programming
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Automatic code generation and tuning for stencil kernels on modern shared memory architectures
Computer Science - Research and Development
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
The Journal of Supercomputing
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Tight bounds for low dimensional star stencils in the external memory model
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Adaptive granularity control in task parallel programs using multiversioning
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
We present a cache oblivious algorithm for stencil computations, which arise for example in finite-difference methods. Our algorithm applies to arbitrary stencils in n-dimensional spaces. On an "ideal cache" of size Z, our algorithm saves a factor of Θ(Z1/n) cache misses compared to a naive algorithm, and it exploits temporal locality optimally throughout the entire memory hierarchy.