Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
A new algorithm for partial redundancy elimination based on SSA form
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Tolerating latency in multiprocessors through compiler-inserted prefetching
ACM Transactions on Computer Systems (TOCS)
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Introducing the IA-64 Architecture
IEEE Micro
The Intel IA-64 Compiler Code Generator
IEEE Micro
Optimizing Software Data Prefetches with Rotating Registers
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Speculative Prefetching of Induction Pointers
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Compiler and Runtime Support for Running OpenMP Programs on Pentium-and Itanium-Architectures
HIPS '03 Proceedings of the Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'03)
A C++ infrastructure for automatic introduction and translation of OpenMP directives
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
A practical OpenMP compiler for system on chips
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Integrating high-level optimizations in a production compiler: design and implementation experience
CC'03 Proceedings of the 12th international conference on Compiler construction
Latency-tolerant software pipelining in a production compiler
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
In this paper, we evaluate the benefits achievable from software data-prefetching techniques for OpenMP* C/C++ and Fortran benchmark programs, using the framework of the Intel production compiler for the Intel® Itanium® 2 processor. Prior work on software data-prefetching study has primarily focused on benchmark performance in the context of a few software data-prefetching schemes developed in research compilers. In contrast, our study is to examine the impact of an extensive set of software data-prefetching schemes on the performance of multi-threaded execution using a full set of SPEC OMPM2001 applications with a product compiler on a commercial multiprocessor system. This paper presents performance results showing that compiler-based software data-prefetching supported in the Intel compiler results in significant performance gain, viz., 11.88% to 99.85% gain for 6 out of 11 applications, 3.83% to 6.96% gain for 4 out of 11 applications, with only one application obtaining less than 1% gain on an IntelR Itanium® 2 processor based SGI Altix* 32-way sharedmemory multiprocessor system.