A vectorizing Fortran compiler
IBM Journal of Research and Development
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
A language for shading and lighting calculations
SIGGRAPH '90 Proceedings of the 17th annual conference on Computer graphics and interactive techniques
Implementation of a portable nested data-parallel language
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel loop transformation techniques for vector-based multiprocessor systems
Parallel loop transformation techniques for vector-based multiprocessor systems
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Compilation techniques for multimedia processors
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
A vectorizing compiler for multimedia extensions
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
Scheduling and Automatic Parallelization
Scheduling and Automatic Parallelization
Implementing database operations using SIMD instructions
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A programming language
Multi-platform Auto-vectorization
Proceedings of the International Symposium on Code Generation and Optimization
Introducing Control Flow into Vectorized Code
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Outer-loop vectorization: revisited for short SIMD architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Fast Construction of SAH BVHs on the Intel Many Integrated Core (MIC) Architecture
IEEE Transactions on Visualization and Computer Graphics
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Ray tracing and volume rendering large molecular data on multi-core and many-core architectures
UltraVis '13 Proceedings of the 8th International Workshop on Ultrascale Visualization
Simple, portable and fast SIMD intrinsic programming: generic simd library
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Boost.SIMD: generic programming for portable SIMDization
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Hi-index | 0.00 |
SIMD instructions are common in CPUs for years now. Using these instructions effectively requires not only vectorization of code, but also modifications to the data layout. However, automatic vectorization techniques are often not powerful enough and suffer from restricted scope of applicability; hence, programmers often vectorize their programs manually by using intrinsics: compiler-known functions that directly expand to machine instructions. They significantly decrease programmer productivity by enforcing a very error-prone and hard-to-read assembly-like programming style. Furthermore, intrinsics are not portable because they are tied to a specific instruction set. In this paper, we show how a C-like language can be extended to allow for portable and efficient SIMD programming. Our extension puts the programmer in total control over where and how control-flow vectorization is triggered. We present a type system and a formal semantics of our extension and prove the soundness of the type system. Using our prototype implementation IVL that targets Intel's MIC architecture and SSE instruction set, we show that the generated code is roughly on par with handwritten intrinsic code.