PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology
ICS '99 Proceedings of the 13th international conference on Supercomputing
Exploiting a new level of DLP in multimedia applications
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Tarantula: a vector extension to the alpha architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Torrent Architecture Manual
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
Vectorizing for a SIMdD DSP architecture
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
An innovative low-power high-performance programmable signal processor for digital communications
IBM Journal of Research and Development
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Superword-Level Parallelism in the Presence of Control Flow
Proceedings of the international symposium on Code generation and optimization
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion
Proceedings of the international symposium on Code generation and optimization
An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
SWARP: a retargetable preprocessor for multimedia instructions: Research Articles
Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
Generation of permutations for SIMD processors
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Multi-platform Auto-vectorization
Proceedings of the International Symposium on Code Generation and Optimization
Optimizing data permutations for SIMD devices
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Vectorization techniques for the Blue Gene/L double FPU
IBM Journal of Research and Development
Induction variable analysis with delayed abstractions
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Optimizing data permutations for SIMD devices
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Sams: single-affiliation multiple-stride parallel memory scheme
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Exploiting SIMD Parallelism with the CGiS Compiler Framework
Languages and Compilers for Parallel Computing
Outer-loop vectorization: revisited for short SIMD architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
On the exploitation of loop-level parallelism in embedded applications
ACM Transactions on Embedded Computing Systems (TECS)
A SIMD optimization framework for retargetable compilers
ACM Transactions on Architecture and Code Optimization (TACO)
Generation of Pack Instruction Sequence for Media Processors Using Multi-Valued Decision Diagram
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Access-pattern-aware on-chip memory allocation for SIMD processors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
MacroSS: macro-SIMDization of streaming applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Dependence-based code generation for a CELL processor
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Generating SIMD vectorized permutations
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
SAMS multi-layout memory: providing multiple views of data to boost SIMD performance
Proceedings of the 24th ACM International Conference on Supercomputing
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
A new compilation technique for SIMD code generation across basic block boundaries
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
A taxonomy of accelerator architectures and their programming models
IBM Journal of Research and Development
Data layout transformation for stencil computations on short-vector SIMD architectures
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets
Proceedings of the international conference on Supercomputing
Using machine learning to improve automatic vectorization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Efficient SIMD code generation for irregular kernels
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Mapping streaming languages to general purpose processors through vectorization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
A compiler framework for extracting superword level parallelism
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Dynamic trace-based analysis of vectorization potential of applications
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Data layout optimization for multi-valued containers in OpenCL
Journal of Parallel and Distributed Computing
Extending OpenMP* with vector constructs for modern multicore SIMD architectures
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Polyhedral parallel code generation for CUDA
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
From relational verification to SIMD loop synthesis
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
When polyhedral transformations meet SIMD code generation
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Vectorization past dependent branches through speculation
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
A Basic Linear Algebra Compiler
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Most implementations of the Single Instruction Multiple Data (SIMD) model available today require that data elements be packed in vector registers. Operations on disjoint vector elements are not supported directly and require explicit data reorganization manipulations. Computations on non-contiguous and especially interleaved data appear in important applications, which can greatly benefit from SIMD instructions once the data is reorganized properly. Vectorizing such computations efficiently is therefore an ambitious challenge for both programmers and vectorizing compilers. We demonstrate an automatic compilation scheme that supports effective vectorization in the presence of interleaved data with constant strides that are powers of 2, facilitating data reorganization. We demonstrate how our vectorization scheme applies to dominant SIMD architectures, and present experimental results on a wide range of key kernels, showing speedups in execution time up to 3.7 for interleaving levels (stride) as high as 8.