Automatic intra-register vectorization for the Intel architecture

Authors:
Aart J. C. Bik;Milind Girkar;Paul M. Grey;Xinmin Tian
Affiliations:
Intel Corporation, 2200 Mission College Blvd. SC12-301, Santa Clara, California;Intel Corporation, 2200 Mission College Blvd. SC12-301, Santa Clara, California;Intel Corporation, 2200 Mission College Blvd. SC12-301, Santa Clara, California;Intel Corporation, 2200 Mission College Blvd. SC12-301, Santa Clara, California
Venue:
International Journal of Parallel Programming
Year:
2002

Citing 22
Cited 39

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Interprocedural dependence analysis and parallelization

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Theory of linear and integer programming

Theory of linear and integer programming
Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
The C programming language

The C programming language
Crafting a compiler with C

Crafting a compiler with C
Advanced compiler design and implementation

Advanced compiler design and implementation
Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Modern Compiler Implementation in C: Basic Techniques

Modern Compiler Implementation in C: Basic Techniques
Dependence Analysis

Dependence Analysis
Parallel Programming and Compilers

Parallel Programming and Compilers
The Complete Guide to Mmx Technology

The Complete Guide to Mmx Technology
Advanced Computer Architectures

Advanced Computer Architectures
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
A Guidebook to FORTRAN on Supercomputers

A Guidebook to FORTRAN on Supercomputers
Solving Linear Systems on Vector and Shared Memory Computers

Solving Linear Systems on Vector and Shared Memory Computers
Introduction to Parallel Computing

Introduction to Parallel Computing
Structure of Computers and Computations

Structure of Computers and Computations

Implementation of a streaming execution unit

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
Retargeting Sequential Image-Processing Programs for Data Parallel Execution

IEEE Transactions on Software Engineering
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion

Proceedings of the international symposium on Code generation and optimization
An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The CSI multimedia architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An efficient way to filter out data dependences with a sufficiently large distance between memory references

ACM SIGPLAN Notices
An integrated simdization framework using virtual vectors

Proceedings of the 19th annual international conference on Supercomputing
Efficient compilation of array expressions

ACM SIGAPL APL Quote Quad
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Array languages and the challenge of modern computer architecture

ACM SIGAPL APL Quote Quad
A compiler for exploiting nested parallelism in OpenMP programs

Parallel Computing - OpenMp
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
In search of a program generator to implement generic transformations for high-performance computing

Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Pack instruction generation for media pUsing multi-valued decision diagram

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Generation of Pack Instruction Sequence for Media Processors Using Multi-Valued Decision Diagram

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Evaluating compiler technology for control-flow optimizations for multimedia extension architectures

Microprocessors & Microsystems
A case study on compiler optimizations for the Intel® Core™ 2 duo processor

International Journal of Parallel Programming
Automatic vectorization using dynamic compilation and tree pattern matching technique in Jikes RVM

Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems
Automatic parallelization for graphics processing units

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Dependence-based code generation for a CELL processor

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
New algorithms for SIMD alignment

CC'07 Proceedings of the 16th international conference on Compiler construction
On the use of the MMC language to utilize SIMD instruction set

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Data dependence analysis for intra-register vectorization

ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
A new compilation technique for SIMD code generation across basic block boundaries

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Automatic parallelization via matrix multiplication

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
An approximate method for filtering out data dependencies with a sufficiently large distance between memory references

The Journal of Supercomputing
On dependence analysis for SIMD enhanced processors

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Compiling high-level languages for vector architectures

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Overflow controlled SIMD arithmetic

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Automatic detection of saturation and clipping idioms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Effect of optimizations on performance of OpenMP programs

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
A compiler framework for extracting superword level parallelism

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Parallel execution of Java loops on Graphics Processing Units

Science of Computer Programming
Hybrid type legalization for a sparse SIMD instruction set

ACM Transactions on Architecture and Code Optimization (TACO)
Vectorization past dependent branches through speculation

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Simple, portable and fast SIMD intrinsic programming: generic simd library

Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Loop Transforming for Reducing Data Alignment on Multi-Core SIMD Processors

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent extensions to the Intel® Architecture feature the SIMD technique to enhance the performance of computational intensive applications that perform the same operation on different elements in a data set. To date, much of the code that exploits these extensions has been hand-coded. The task of the programmer is substantially simplified, however, if a compiler does this exploitation automatically. The high-performance Intel® C++/Fortran compiler supports automatic translation of serial loops into code that uses the SIMD extensions to the Intel® Architecture. This paper provides a detailed overview of the automatic vectorization methods used by this compiler together with an experimental validation of their effectiveness.