Using machine learning to improve automatic vectorization

Authors:
Kevin Stock;Louis-Noël Pouchet;P. Sadayappan
Affiliations:
The Ohio State University;The Ohio State University;The Ohio State University
Venue:
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Year:
2012

Citing 28
Cited 1

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Optimizing for reduced code space using genetic algorithms

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Eliminating redundancies in sum-of-product array computations

ICS '01 Proceedings of the 15th international conference on Supercomputing
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Adaptive Optimizing Compilers for the 21st Century

The Journal of Supercomputing
Increasing and Detecting Memory Address Congruence

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Machine Learning Approach to Automatic Production of Compiler Heuristics

AIMSA '02 Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications
A high-level approach to synthesis of high-performance codes for quantum chemistry

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Probabilistic source-level optimisation of embedded programs

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Multi-platform Auto-vectorization

Proceedings of the International Symposium on Code Generation and Optimization
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Rapidly Selecting Good Compiler Optimizations using Performance Counters

Proceedings of the International Symposium on Code Generation and Optimization
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Exploring the Optimization Space of Dense Linear Algebra Kernels

Languages and Compilers for Parallel Computing
A SIMD optimization framework for retargetable compilers

ACM Transactions on Architecture and Code Optimization (TACO)
A scalable auto-tuning framework for compiler optimization

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Polyhedral-Model Guided Loop-Nest Auto-Vectorization

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Portable compiler optimisation across embedded programs and microarchitectures using machine learning

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
New algorithms for SIMD alignment

CC'07 Proceedings of the 16th international conference on Compiler construction
Automatic creation of tile size selection models

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
WEKA---Experiences with a Java Open-Source Project

The Journal of Machine Learning Research
Model-Driven SIMD Code Generation for a Multi-resolution Tensor Kernel

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium

Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic vectorization is critical to enhancing performance of compute-intensive programs on modern processors. However, there is much room for improvement over the auto-vectorization capabilities of current production compilers through careful vector-code synthesis that utilizes a variety of loop transformations (e.g., unroll-and-jam, interchange, etc.). As the set of transformations considered is increased, the selection of the most effective combination of transformations becomes a significant challenge: Currently used cost models in vectorizing compilers are often unable to identify the best choices. In this paper, we address this problem using machine learning models to predict the performance of SIMD codes. In contrast to existing approaches that have used high-level features of the program, we develop machine learning models based on features extracted from the generated assembly code. The models are trained offline on a number of benchmarks and used at compile-time to discriminate between numerous possible vectorized variants generated from the input code. We demonstrate the effectiveness of the machine learning model by using it to guide automatic vectorization on a variety of tensor contraction kernels, with improvements ranging from 2× to 8× over Intel ICC's auto-vectorized code. We also evaluate the effectiveness of the model on a number of stencil computations and show good improvement over auto-vectorized code.