Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
BURG: fast optimal instruction selection and tree parsing
ACM SIGPLAN Notices
Engineering a simple, efficient code-generator generator
ACM Letters on Programming Languages and Systems (LOPLAS)
Advanced compiler design and implementation
Advanced compiler design and implementation
Customized instruction-sets for embedded processors
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Code selection for media processors with SIMD instructions
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Compilation techniques for multimedia processors
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
A framework for fast hardware-software co-simulation
Proceedings of the conference on Design, automation and test in Europe
Effectiveness of the ASIP design system PEAS-III in design of pipelined processors
Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Functional abstraction driven design space exploration of heterogeneous programmable architectures
Proceedings of the 14th international symposium on Systems synthesis
Retargetable compiler technology for embedded systems: tools and applications
Retargetable compiler technology for embedded systems: tools and applications
Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools
Code Optimization Techniques for Embedded Processors: Methods, Algorithms, and Tools
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Architecture Exploration for Embedded Processors with Lisa
Architecture Exploration for Embedded Processors with Lisa
Increasing and Detecting Memory Address Congruence
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Methodology and Tool Suite for C Compiler Generation from ADL Processor Models
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion
Proceedings of the international symposium on Code generation and optimization
SWARP: a retargetable preprocessor for multimedia instructions: Research Articles
Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
Generation of permutations for SIMD processors
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Multi-platform Auto-vectorization
Proceedings of the International Symposium on Code Generation and Optimization
Optimizing data permutations for SIMD devices
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Auto-vectorization of interleaved data for SIMD
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Retargetable code optimization with SIMD instructions
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Building ASIPs: The Mescal Methodology
Building ASIPs: The Mescal Methodology
HiLO: high level optimization of FFTs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Data layout transformation for stencil computations on short-vector SIMD architectures
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets
Proceedings of the international conference on Supercomputing
Using machine learning to improve automatic vectorization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Scout: a source-to-source transformator for SIMD-Optimizations
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
A compiler framework for extracting superword level parallelism
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Hi-index | 0.00 |
Retargetable C compilers are currently widely used to quickly obtain compiler support for new embedded processors and to perform early processor architecture exploration. A partially inherent problem of the retargetable compilation approach, though, is the limited code quality as compared to hand-written compilers or assembly code due to the lack of dedicated optimizations techniques. This problem can be circumvented by designing flexible, retargetable code optimization techniques that apply to a certain range of target architectures. This article focuses on target machines with SIMD instruction support, a common feature in embedded processors for multimedia applications. However, SIMD optimization is known to be a difficult task since SIMD architectures are largely nonuniform, support only a limited set of data types and impose several memory alignment constraints. Additionally, such techniques require complicated loop transformations, which are tailored to the SIMD architecture in order to exhibit the necessary amount of parallelism in the code. Thus, integrating the SIMD optimization and the required loop transformations together in a single retargeting formalism is an ambitious challenge. In this article, we present an efficient and quickly retargetable SIMD code optimization framework that is integrated into an industrial retargetable C compiler. Experimental results for different processors demonstrate that the proposed technique applies to real-life target machines and that it produces code quality improvements close to the theoretical limit.