Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Simple vector microprocessors for multimedia applications
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Automatic intra-register vectorization for the Intel architecture
International Journal of Parallel Programming
Compilation Techniques for Multimedia Processors
International Journal of Parallel Programming
A Vectorizing Compiler for Multimedia Extensions
International Journal of Parallel Programming
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Design and characterization of the Berkeley multimedia workload
Multimedia Systems
Vectorizing for a SIMdD DSP architecture
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion
Proceedings of the international symposium on Code generation and optimization
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Exploiting Vector Parallelism in Software Pipelined Loops
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Multi-platform Auto-vectorization
Proceedings of the International Symposium on Code Generation and Optimization
Vector LLVA: a virtual vector instruction set for media processing
Proceedings of the 2nd international conference on Virtual execution environments
Compiling for an indirect vector register architecture
Proceedings of the 5th conference on Computing frontiers
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
Exploiting SIMD Parallelism with the CGiS Compiler Framework
Languages and Compilers for Parallel Computing
Outer-loop vectorization: revisited for short SIMD architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
On the exploitation of loop-level parallelism in embedded applications
ACM Transactions on Embedded Computing Systems (TECS)
CUDA-Lite: Reducing GPU Programming Complexity
Languages and Compilers for Parallel Computing
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Generation of Pack Instruction Sequence for Media Processors Using Multi-Valued Decision Diagram
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Efficient SIMD code generation for irregular kernels
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Vapor SIMD: Auto-vectorize once, run everywhere
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Extending OpenMP* with vector constructs for modern multicore SIMD architectures
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Polyhedral parallel code generation for CUDA
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Automatic simdization for multimedia extensions faces several new challenges that are not present in traditional vectorization. Some of the new issues are due to the more restrictive SIMD architectures designed for multimedia extensions. Among them are alignment constraints, lack of memory gather and scatter support, and the short and fixed-length nature of SIMD vectors. Since these constraints affect some very basic components of a program, a compiler must not only provide solid solutions to individual issues, but also take an integrated approach to address these constraints in combination.In this paper, we propose a simdization framework that addresses several orthogonal aspects of simdization, such as alignment handling, simdization of loops with mixed data lengths, and SIMD parallelism extraction from different program scopes (from basic blocks to inner loops). The novelty of this framework is its ability to facilitate interactions between different techniques based on the simple intermediate representation of virtual vectors. Measurements on a PPC970 with a VMX SIMD unit indicate speedup factors of up to 8.11 for numerical/video/communication kernels and speedup factors of up to 2.16 for benchmarks, when automatic simdization is turned on.