Extending OpenMP* with vector constructs for modern multicore SIMD architectures

Authors:
Michael Klemm;Alejandro Duran;Xinmin Tian;Hideki Saito;Diego Caballero;Xavier Martorell
Affiliations:
Intel Corporation;Barcelona Supercomputing Center, Spain;Intel Corporation;Intel Corporation;Barcelona Supercomputing Center, Spain;Intel Corporation, USA,Universitat Politecnica de Catalunya, Spain
Venue:
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Year:
2012

Citing 14
Cited 1

Exploiting superword level parallelism with multimedia instruction sets

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Vectorizing for a SIMdD DSP architecture

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Efficient SIMD Code Generation for Runtime Alignment and Length Conversion

Proceedings of the international symposium on Code generation and optimization
An integrated simdization framework using virtual vectors

Proceedings of the 19th annual international conference on Supercomputing
Multi-platform Auto-vectorization

Proceedings of the International Symposium on Code Generation and Optimization
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Outer-loop vectorization: revisited for short SIMD architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
The future of microprocessors

Communications of the ACM
Multi- and many-core data mining with adaptive sparse grids

Proceedings of the 8th ACM International Conference on Computing Frontiers
An Evaluation of Vectorizing Compilers

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Whole-function vectorization

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture

Computing in Science and Engineering
Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Simple, portable and fast SIMD intrinsic programming: generic simd library

Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to obtain maximum performance, many applications require to extend parallelism from multi-threading to instruction-level (SIMD) parallelism that exists in many current (and future) multi-core architectures. While auto-vectorization technology has been used to exploit this SIMD level, it is not always enough due to OpenMP semantics and compiler technology limitations. In those cases, programmers need to resort to low-level intrinsics or vendor specific directives. We propose a new OpenMP directive: the simd directive. This directive will allow programmers to guide the vectorization process enabling a more productive and portable exploitation of the SIMD level. Our performance results show significant improvements over current auto-vectorizing technology of the Intel® Composer XE 2011.