Adaptive Mapping and Parameter Selection Scheme to Improve Automatic Code Generation for GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Tiling is a key program transformation to achieve effective data reuse. But the performance of tiled programs can vary considerably with different tile sizes. Hence the selection of good tile sizes is crucial. Although there has been considerable research on analytical models for selecting tile sizes, they have not been shown to be effective in finding optimal tile sizes across a range of programs and target architectures. Auto-tuning is a viable alternative that is often used in practice, and involves the execution of different combinations of tile sizes in a systematic fashion to find the best ones. But this is sometimes infeasible--for instance when the program is to be run on unknown platforms (e.g., cloud environments). We propose a novel approach for generating code to enable dynamic tile size selection, based on monitoring the performance of a few loop iterations. The selection operates at run time on the "production" run, without any a priori knowledge of the execution environment. We discuss the theory and implementation of a parametric tiled code generator that enables run-time tile size tuning and describe a search strategy to determine effective tile sizes. Experimental results demonstrate the effectiveness of the approach.