The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Integration, the VLSI Journal
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Loop tiling for parallelism
Parallel Parameter Tuning for Applications with Performance Variability
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Dynamic tiling for effective use of shared caches on multithreaded processors
International Journal of High Performance Computing and Networking
Computer Architecture Techniques for Power-Efficiency
Computer Architecture Techniques for Power-Efficiency
Journal of Computational and Applied Mathematics
Online Adaptive Code Generation and Tuning
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Analytical bounds for optimal tile size selection
CC'12 Proceedings of the 21st international conference on Compiler Construction
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
Tiling is an important program transformation that is often used to enhance cache locality and to obtain coarse-grained parallelism. In this paper, we address the problem of generating adaptive parametric tiled code for parallel execution contexts; in other words, generating parallel tiled code in which tile sizes can be changed on the fly during execution. Changing of tile sizes during pipelined parallel execution of tiles presents the following fundamental code-generation challenge: the unscanned iteration space may become non-convex. We develop novel solutions for the adaptive parallel tiled code generation problem. Using adaptive tiling, auto-tuning for tile size selection can be accelerated: in a single run of the tiled code, several tile sizes may be tested for their performance and thus expedite auto-tuning. Adaptive tiling is also useful in scenarios where tile sizes need to be dynamically altered to tailor to the changing execution environments, such as dynamically resized caches for power savings. Experimental evaluation on a number of benchmarks demonstrates the effectiveness of the developed approach.