Multiple Instruction Stream Processor
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 21st annual international conference on Supercomputing
Mapping parallelism to multi-cores: a machine learning based approach
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Exploiting Thread-Level Parallelism (TLP) is a promising way to improve the performance of applications with the advent of general-purpose cost effective uni-processor and shared-memory multiprocessor systems. In this paper, we describe the OpenMP implementation in the Intel炉 C++ and Fortran compiler for Intel architectures. We present our major design consideration and decisions in the Intel compiler for generating efficient multithreaded codes guided by OpenMP directives and pragmas. We describe several transformation phases in the compiler for the OpenMP * parallelization. In addition to compiler support, the OpenMP runtime library is a critical part of the Intel compiler. We present runtime techniques developed in the Intel OpenMP runtime library for exploiting thread-level parallelism as well as integrating the OpenMP support with other forms of threading termed as sibling parallelism. The performance results of a set of benchmarks show a good speedup over well-optimized serial code performance on Intel炉 Pentium炉 and Itanium炉 processor-based systems.