POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Evidence-based static branch prediction using machine learning
ACM Transactions on Programming Languages and Systems (TOPLAS)
Learning to schedule straight-line code
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A tile selection algorithm for data locality and cache interference
ICS '99 Proceedings of the 13th international conference on Supercomputing
Quantifying the multi-level nature of tiling interactions
International Journal of Parallel Programming
Loop tiling for parallelism
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A Comparison of Compiler Tiling Algorithms
CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
Meta optimization: improving compiler heuristics with machine learning
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Scheduling Straight-Line Code Using Reinforcement Learning and Rollouts
Scheduling Straight-Line Code Using Reinforcement Learning and Rollouts
Optimizing Program Locality Through CMEs and GAs
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
A Quantitative Analysis of Tile Size Selection Algorithms
The Journal of Supercomputing
Inducing heuristics to decide whether to schedule
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy
Proceedings of the international symposium on Code generation and optimization
Predicting Unroll Factors Using Supervised Classification
Proceedings of the international symposium on Code generation and optimization
Minimizing development and maintenance costs in supporting persistently optimized BLAS
Software—Practice & Experience - Research Articles
Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
Think globally, search locally
Proceedings of the 19th annual international conference on Supercomputing
Impact of modern memory subsystems on cache optimizations for stencil computations
Proceedings of the 2005 workshop on Memory system performance
A New Genetic Algorithm for Loop Tiling
The Journal of Supercomputing
An analytical model for loop tiling and its solution
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Method-specific dynamic compilation using logistic regression
Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Profitable loop fusion and tiling using model-driven empirical search
Proceedings of the 20th annual international conference on Supercomputing
Parameterized tiled loops for free
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parametric multi-level tiling of imperfectly nested loops
Proceedings of the 23rd international conference on Supercomputing
Optimizing matrix multiplication with a classifier learning system
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Analytic models and empirical search: a hybrid approach to code optimization
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Efficient tiled loop generation: D-tiling
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Model-driven tile size selection for DOACROSS loops on GPUs
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Using machine learning to improve automatic vectorization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Journal of Parallel and Distributed Computing
Automatic static feature generation for compiler optimization problems
AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Using graph-based program characterization for predictive modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Analytical bounds for optimal tile size selection
CC'12 Proceedings of the 21st international conference on Compiler Construction
Automatic OpenCL work-group size selection for multicore CPUs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is usually achieved by hand-crafting tile size selection (TSS) models that characterize the performance of the tiled program as a function of tile sizes. The best tile sizes are selected by either directly using the TSS model or by using the TSS model together with an empirical search. Hand-crafting accurate TSS models is hard, and adapting them to different architecture/compiler, or even keeping them up-to-date with respect to the evolution of a single compiler is often just as hard. Instead of hand-crafting TSS models, can we automatically learn or create them? In this paper, we show that for a specific class of programs fairly accurate TSS models can be automatically created by using a combination of simple program features, synthetic kernels, and standard machine learning techniques. The automatic TSS model generation scheme can also be directly used for adapting the model and/or keeping it up-to-date. We evaluate our scheme on six different architecture-compiler combinations (chosen from three different architectures and four different compilers). The models learned by our method have consistently shown near-optimal performance (within 5% of the optimal on average) across all architecture-compiler combinations.