Cache oblivious parallelograms in iterative stencil computations

Authors:
Robert Strzodka;Mohammed Shaheen;Dawid Pajak;Hans-Peter Seidel
Affiliations:
Max Planck Institut Informatik, Saarbrücken, Germany;Max Planck Institut Informatik, Saarbrücken, Germany;West Pomeranian University of Technology, Szczecin, Poland;Max Planck Institut Informatik, Saarbrücken, Germany
Venue:
Proceedings of the 24th ACM International Conference on Supercomputing
Year:
2010

Citing 11
Cited 7

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Tight bounds on cache use for stencil operations on rectangular grids

Journal of the ACM (JACM)
Using Time Skewing to Eliminate Idle Time due to Memory Bandwidth and Network Limitations

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Cache oblivious stencil computations

Proceedings of the 19th annual international conference on Supercomputing
The cache complexity of multithreaded cache oblivious algorithms

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Implicit and explicit optimizations for stencil computations

Proceedings of the 2006 workshop on Memory system performance and correctness
Effective automatic parallelization of stencil computations

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Multi-level tiling: M for the price of one

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Parametric multi-level tiling of imperfectly nested loops

Proceedings of the 23rd international conference on Supercomputing

Time skewing made simple

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Automatic code generation and tuning for stencil kernels on modern shared memory architectures

Computer Science - Research and Development
High-performance code generation for stencil computations on GPU architectures

Proceedings of the 26th ACM international conference on Supercomputing
Patus for convenient high-performance stencils: evaluation in earthquake simulations

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tiling stencil computations to maximize parallelism

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Split tiling for GPUs: automatic parallelization using trapezoidal tiles

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
A stencil compiler for short-vector SIMD architectures

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new cache oblivious scheme for iterative stencil computations that performs beyond system bandwidth limitations as though gigabytes of data could reside in an enormous on-chip cache. We compare execution times for 2D and 3D spatial domains with up to 128 million double precision elements for constant and variable stencils against hand-optimized naive code and the automatic polyhedral parallelizer and locality optimizer PluTo and demonstrate the clear superiority of our results. The performance benefits stem from a tiling structure that caters for data locality, parallelism and vectorization simultaneously. Rather than tiling the iteration space from inside, we take an exterior approach with a predefined hierarchy, simple regular parallelogram tiles and a locality preserving parallelization. These advantages come at the cost of an irregular work-load distribution but a tightly integrated load-balancer ensures a high utilization of all resources.