Time Skewing for Parallel Computers
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
The cache complexity of multithreaded cache oblivious algorithms
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Parameterized tiled loops for free
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Parallel data-locality aware stencil computations on modern micro-architectures
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Cache oblivious parallelograms in iterative stencil computations
Proceedings of the 24th ACM International Conference on Supercomputing
Scalable Earthquake Simulation on Petascale Supercomputers
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments
ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
Auto-tuning stencil codes for cache-based multicore platforms
Auto-tuning stencil codes for cache-based multicore platforms
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Mint: realizing CUDA performance in 3D stencil methods with annotated C
Proceedings of the international conference on Supercomputing
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A Heterogeneous Parallel Framework for Domain-Specific Languages
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Loop transformation recipes for code generation and auto-tuning
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
Patus is a code generation and auto-tuning framework for stencil computations targeting modern multi and many-core processors. The goals of the framework are productivity and portability for achieving high performance on the target platform. Its stencil specification language allows the programmer to express the computation in a concise way independently of hardware architecture-specific details. Thus, it increases the programmer productivity by removing the need for manual low-level tuning. We illustrate the impact of the stencil code generation in seismic applications, for which both weak and strong scaling are important. We evaluate the performance by focusing on a scalable discretization of the wave equation and testing complex simulation types of the AWP-ODC code to aim at excellent parallel efficiency, preparing for petascale 3-D earthquake calculations.