An integrated simdization framework using virtual vectors

Authors:
Peng Wu;Alexandre E. Eichenberger;Amy Wang;Peng Zhao
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM Toronto Laboratory, Markham, Ontario, Canada;IBM Toronto Laboratory, Markham, Ontario, Canada
Venue:
Proceedings of the 19th annual international conference on Supercomputing
Year:
2005

Citing 13
Cited 17

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
Simple vector microprocessors for multimedia applications

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Automatic intra-register vectorization for the Intel architecture

International Journal of Parallel Programming
Compilation Techniques for Multimedia Processors

International Journal of Parallel Programming
A Vectorizing Compiler for Multimedia Extensions

International Journal of Parallel Programming
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Design and characterization of the Berkeley multimedia workload

Multimedia Systems
Vectorizing for a SIMdD DSP architecture

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance

Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion

Proceedings of the international symposium on Code generation and optimization

Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Exploiting Vector Parallelism in Software Pipelined Loops

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Multi-platform Auto-vectorization

Proceedings of the International Symposium on Code Generation and Optimization
Vector LLVA: a virtual vector instruction set for media processing

Proceedings of the 2nd international conference on Virtual execution environments
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
Compiling for an indirect vector register architecture

Proceedings of the 5th conference on Computing frontiers
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
Exploiting SIMD Parallelism with the CGiS Compiler Framework

Languages and Compilers for Parallel Computing
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
On the exploitation of loop-level parallelism in embedded applications

ACM Transactions on Embedded Computing Systems (TECS)
CUDA-Lite: Reducing GPU Programming Complexity

Languages and Compilers for Parallel Computing
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Generation of Pack Instruction Sequence for Media Processors Using Multi-Valued Decision Diagram

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Efficient SIMD code generation for irregular kernels

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Vapor SIMD: Auto-vectorize once, run everywhere

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Extending OpenMP* with vector constructs for modern multicore SIMD architectures

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Polyhedral parallel code generation for CUDA

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic simdization for multimedia extensions faces several new challenges that are not present in traditional vectorization. Some of the new issues are due to the more restrictive SIMD architectures designed for multimedia extensions. Among them are alignment constraints, lack of memory gather and scatter support, and the short and fixed-length nature of SIMD vectors. Since these constraints affect some very basic components of a program, a compiler must not only provide solid solutions to individual issues, but also take an integrated approach to address these constraints in combination.In this paper, we propose a simdization framework that addresses several orthogonal aspects of simdization, such as alignment handling, simdization of loops with mixed data lengths, and SIMD parallelism extraction from different program scopes (from basic blocks to inner loops). The novelty of this framework is its ability to facilitate interactions between different techniques based on the simple intermediate representation of virtual vectors. Measurements on a PPC970 with a VMX SIMD unit indicate speedup factors of up to 8.11 for numerical/video/communication kernels and speedup factors of up to 2.16 for benchmarks, when automatic simdization is turned on.