Impact of Compiler-based Data-Prefetching Techniques on SPEC OMP Application Performance
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A compiler for exploiting nested parallelism in OpenMP programs
Parallel Computing - OpenMp
Parallel Computing - Algorithmic skeletons
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Scheduling dynamic OpenMP applications over multicore architectures
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Factory: an object-oriented parallel programming substrate for deep multiprocessors
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q
IBM Journal of Research and Development
Online feedback-directed optimizations for parallel Java code
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Hi-index | 0.00 |
Exploiting Thread-Level Parallelism (TLP) is a promisingway to improve the performance of applications with theadvent of general-purpose cost effective uni-processor andshared-memory multiprocessor systems. In this paper, wedescribe the OpenMP implementation in the Intel®. C++and Fortran compilers for Intel platforms. We present ourmajor design consideration and decisions in the Intelcompiler for generating efficient multithreaded codesguided by OpenMP directives and pragmas. We describeseveral transformation phases in the compiler for theOpenMP parallelization. In addition to compiler support,the OpenMP runtime library is a critical part of the Intelcompiler. We present untime techniques developed in theIntel OpenMP untime library for exploiting thread-levelparallelism as well as integrating the OpenMP supportwith other forms of threading termed as sibling parallelism.The performance results of a set of benchmarks show goodspeedups over the well-optimized serial code performanceon Intel®. Pentium-and Itanium-processor based systems.