Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Speculative dynamic vectorization
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Increasing and Detecting Memory Address Congruence
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Superword-Level Parallelism in the Presence of Control Flow
Proceedings of the international symposium on Code generation and optimization
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion
Proceedings of the international symposium on Code generation and optimization
An integrated simdization framework using virtual vectors
Proceedings of the 19th annual international conference on Supercomputing
Multi-platform Auto-vectorization
Proceedings of the International Symposium on Code Generation and Optimization
Optimizing Dynamic Binary Translation for SIMD Instructions
Proceedings of the International Symposium on Code Generation and Optimization
Vector LLVA: a virtual vector instruction set for media processing
Proceedings of the 2nd international conference on Virtual execution environments
Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Outer-loop vectorization: revisited for short SIMD architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Polyhedral-Model Guided Loop-Nest Auto-Vectorization
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Processor virtualization and split compilation for heterogeneous multicore embedded systems
Proceedings of the 47th Design Automation Conference
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Speculatively vectorized bytecode
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
SIMD defragmenter: efficient ILP realization on data-parallel architectures
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Vectorization technology to improve interpreter performance
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hybrid type legalization for a sparse SIMD instruction set
ACM Transactions on Architecture and Code Optimization (TACO)
A Basic Linear Algebra Compiler
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
On the advantage of time-varying diversity of workload on functionally asymmetric multi-core
Proceedings of International Workshop on Adaptive Self-tuning Computing Systems
JIT technology with C/C++: Feedback-directed dynamic recompilation for statically compiled languages
ACM Transactions on Architecture and Code Optimization (TACO)
Boost.SIMD: generic programming for portable SIMDization
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Hi-index | 0.00 |
Just-in-Time (JIT) compiler technology offers portability while facilitating target- and context-specific specialization. Single-Instruction-Multiple-Data (SIMD) hardware is ubiquitous and markedly diverse, but can be difficult for JIT compilers to efficiently target due to resource and budget constraints. We present our design for a synergistic auto-vectorizing compilation scheme. The scheme is composed of an aggressive, generic offline stage coupled with a lightweight, target-specific online stage. Our method leverages the optimized intermediate results provided by the first stage across disparate SIMD architectures from different vendors, having distinct characteristics ranging from different vector sizes, memory alignment and access constraints, to special computational idioms. We demonstrate the effectiveness of our design using a set of kernels that exercise innermost loop, outer loop, as well as straight-line code vectorization, all automatically extracted by the common offline compilation stage. This results in performance comparable to that provided by specialized monolithic offline compilers. Our framework is implemented using open-source tools and standards, thereby promoting interoperability and extendibility.