Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Vectorizing for a SIMdD DSP architecture
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion
Proceedings of the international symposium on Code generation and optimization
An integrated simdization framework using virtual vectors
Proceedings of the 19th annual international conference on Supercomputing
Multi-platform Auto-vectorization
Proceedings of the International Symposium on Code Generation and Optimization
Auto-vectorization of interleaved data for SIMD
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Outer-loop vectorization: revisited for short SIMD architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Communications of the ACM
Multi- and many-core data mining with adaptive sparse grids
Proceedings of the 8th ACM International Conference on Computing Frontiers
An Evaluation of Vectorizing Compilers
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture
Computing in Science and Engineering
Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Simple, portable and fast SIMD intrinsic programming: generic simd library
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Hi-index | 0.00 |
In order to obtain maximum performance, many applications require to extend parallelism from multi-threading to instruction-level (SIMD) parallelism that exists in many current (and future) multi-core architectures. While auto-vectorization technology has been used to exploit this SIMD level, it is not always enough due to OpenMP semantics and compiler technology limitations. In those cases, programmers need to resort to low-level intrinsics or vendor specific directives. We propose a new OpenMP directive: the simd directive. This directive will allow programmers to guide the vectorization process enabling a more productive and portable exploitation of the SIMD level. Our performance results show significant improvements over current auto-vectorizing technology of the Intel® Composer XE 2011.