High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Automatic intra-register vectorization for the Intel architecture
International Journal of Parallel Programming
Automatic parallelization for symmetric shared-memory multiprocessors
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Compiler and Runtime Support for Running OpenMP Programs on Pentium-and Itanium-Architectures
HIPS '03 Proceedings of the Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'03)
A C++ infrastructure for automatic introduction and translation of OpenMP directives
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
A practical OpenMP compiler for system on chips
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
An exact data dependence testing method for quadratic expressions
Information Sciences: an International Journal
Exploiting fine-grain thread parallelism on multicore architectures
Scientific Programming - Software Development for Multi-core Computing Systems
A microbenchmark study of OpenMP overheads under nested parallelism
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Scheduling dynamic OpenMP applications over multicore architectures
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Nested parallelism in the OMPI OpenmP/C compiler
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
CUDA-NP: realizing nested thread-level parallelism in GPGPU applications
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
This paper presents the design and implementation of a parallelization framework and OpenMP runtime support in Intel C++ & Fortran compilers for exploiting nested parallelism in applications using OpenMP pragmas or directives. We conduct the performance evaluation of two multimedia applications parallelized with OpenMP pragmas and compiled with the Intel C++ compiler on Hyper-Threading Technology (HT) enabled multiprocessor systems. The performance results show that the multithreaded code generated by the Intel compiler achieved a speedup up to 4.69 on 4 processors with HT enabled for five different input video sequences for the H.264 encoder workload, and a 1.28 speedup on an HT enabled single-CPU system and 1.99 speedup on an HT-enabled dual-CPU system for the audio-visual speech recognition workload. The performance gain due to exploiting nested parallelism for leveraging Hyper-Threading Technology is up to 70% for two multimedia workloads under different multiprocessor system configurations. These results demonstrate that hyper-threading benefits can be achieved by exploiting nested parallelism through Intel compiler and runtime system support for OpenMP programs. .