When polyhedral transformations meet SIMD code generation

Authors:
Martin Kong;Richard Veras;Kevin Stock;Franz Franchetti;Louis-Noël Pouchet;P. Sadayappan
Affiliations:
Ohio State University, Columbus, OH, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Ohio State University, Columbus, OH, USA;Carnegie Mellon University, Pittsburgh, PA, USA;University of California Los Angeles, Los Angeles, CA, USA;Ohio State University, Columbus, OH, USA
Venue:
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Year:
2013

Citing 27
Cited 1

Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A fast Fourier transform compiler

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Programming pearls: little languages

Communications of the ACM
Generative programming: methods, tools, and applications

Generative programming: methods, tools, and applications
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Achieving extensibility through product-lines and domain-specific languages: a case study

ACM Transactions on Software Engineering and Methodology (TOSEM)
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Generating Product-Lines of Product-Families

Proceedings of the 17th IEEE international conference on Automated software engineering
Little language processing, an alternative to courses on compiler construction

ACM SIGCSE Bulletin
Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Iterative optimization in the polyhedral model: part ii, multidimensional time

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Parametric multi-level tiling of imperfectly nested loops

Proceedings of the 23rd international conference on Supercomputing
Polyhedral-Model Guided Loop-Nest Auto-Vectorization

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Algebraic signal processing theory: Cooley-Tukey type algorithms for real DFTs

IEEE Transactions on Signal Processing
Parameterized tiling revisited

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Loop transformations: convexity, pruning and optimization

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Data layout transformation for stencil computations on short-vector SIMD architectures

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Tiling stencil computations to maximize parallelism

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

A Basic Linear Algebra Compiler

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data locality and parallelism are critical optimization objectives for performance on modern multi-core machines. Both coarse-grain parallelism (e.g., multi-core) and fine-grain parallelism (e.g., vector SIMD) must be effectively exploited, but despite decades of progress at both ends, current compiler optimization schemes that attempt to address data locality and both kinds of parallelism often fail at one of the three objectives. We address this problem by proposing a 3-step framework, which aims for integrated data locality, multi-core parallelism and SIMD execution of programs. We define the concept of vectorizable codelets, with properties tailored to achieve effective SIMD code generation for the codelets. We leverage the power of a modern high-level transformation framework to restructure a program to expose good ISA-independent vectorizable codelets, exploiting multi-dimensional data reuse. Then, we generate ISA-specific customized code for the codelets, using a collection of lower-level SIMD-focused optimizations. We demonstrate our approach on a collection of numerical kernels that we automatically tile, parallelize and vectorize, exhibiting significant performance improvements over existing compilers.