A SIMD optimization framework for retargetable compilers

Authors:
Manuel Hohenauer;Felix Engel;Rainer Leupers;Gerd Ascheid;Heinrich Meyr
Affiliations:
RWTH Aachen University, Germany;RWTH Aachen University, Germany;RWTH Aachen University, Germany;RWTH Aachen University, Germany;RWTH Aachen University, Germany
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2009

Citing 29
Cited 5

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
BURG: fast optimal instruction selection and tree parsing

ACM SIGPLAN Notices
Engineering a simple, efficient code-generator generator

ACM Letters on Programming Languages and Systems (LOPLAS)
Advanced compiler design and implementation

Advanced compiler design and implementation
Customized instruction-sets for embedded processors

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Code selection for media processors with SIMD instructions

DATE '00 Proceedings of the conference on Design, automation and test in Europe
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Compilation techniques for multimedia processors

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
A framework for fast hardware-software co-simulation

Proceedings of the conference on Design, automation and test in Europe
Effectiveness of the ASIP design system PEAS-III in design of pipelined processors

Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Functional abstraction driven design space exploration of heterogeneous programmable architectures

Proceedings of the 14th international symposium on Systems synthesis
Retargetable compiler technology for embedded systems: tools and applications

Retargetable compiler technology for embedded systems: tools and applications
Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools

Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Architecture Exploration for Embedded Processors with Lisa

Architecture Exploration for Embedded Processors with Lisa
Increasing and Detecting Memory Address Congruence

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Methodology and Tool Suite for C Compiler Generation from ADL Processor Models

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Guest editor's introduction

International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion

Proceedings of the international symposium on Code generation and optimization
SWARP: a retargetable preprocessor for multimedia instructions: Research Articles

Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
Generation of permutations for SIMD processors

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Multi-platform Auto-vectorization

Proceedings of the International Symposium on Code Generation and Optimization
Optimizing data permutations for SIMD devices

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Retargetable code optimization with SIMD instructions

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Building ASIPs: The Mescal Methodology

Building ASIPs: The Mescal Methodology
HiLO: high level optimization of FFTs

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Data layout transformation for stencil computations on short-vector SIMD architectures

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets

Proceedings of the international conference on Supercomputing
Using machine learning to improve automatic vectorization

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Scout: a source-to-source transformator for SIMD-Optimizations

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
A compiler framework for extracting superword level parallelism

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Retargetable C compilers are currently widely used to quickly obtain compiler support for new embedded processors and to perform early processor architecture exploration. A partially inherent problem of the retargetable compilation approach, though, is the limited code quality as compared to hand-written compilers or assembly code due to the lack of dedicated optimizations techniques. This problem can be circumvented by designing flexible, retargetable code optimization techniques that apply to a certain range of target architectures. This article focuses on target machines with SIMD instruction support, a common feature in embedded processors for multimedia applications. However, SIMD optimization is known to be a difficult task since SIMD architectures are largely nonuniform, support only a limited set of data types and impose several memory alignment constraints. Additionally, such techniques require complicated loop transformations, which are tailored to the SIMD architecture in order to exhibit the necessary amount of parallelism in the code. Thus, integrating the SIMD optimization and the required loop transformations together in a single retargeting formalism is an ambitious challenge. In this article, we present an efficient and quickly retargetable SIMD code optimization framework that is integrated into an industrial retargetable C compiler. Experimental results for different processors demonstrate that the proposed technique applies to real-life target machines and that it produces code quality improvements close to the theoretical limit.