Optimizing Software Data Prefetches with Rotating Registers

Authors:
Gautam Doshi;Rakesh Krishnaiyer;Kalyan Muthukumar
Affiliations:
-;-;-
Venue:
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Year:
2001

Citing 0
Cited 11

Profile-guided post-link stride prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
Scientific computing on the Itanium™ processor

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Value-Profile Guided Stride Prefetching for Irregular Code

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Vectorizing for a SIMdD DSP architecture

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Impact of Compiler-based Data-Prefetching Techniques on SPEC OMP Application Performance

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Scientific computing on the Itanium® processor

Scientific Programming - Best papers from SC 2001
A compiler-directed data prefetching scheme for chip multiprocessors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Energy-efficient hardware data prefetching

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Improving the performance of GCC by exploiting IA-64 architectural features

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Compositional approach applied to loop specialization

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: Software data prefetching is a well-known technique to improve the performance of programs that suffer many cache misses at several levels of memory hierarchy. However, it has significant overhead in terms of increased code size, additional instructions, and possibly increased memory bus traffic due to redundant prefetches. This paper presents two novel methods to reduce the overhead of software data prefetching and improve the program performance by optimized prefetch scheduling. These methods exploit the availability of rotating registers and predication in architectures such as the ItaniumTM architecture. The methods (1) minimize redundant prefetches, (2) reduce the number of issue slots needed for prefetch instructions, and (3) avoid branch mispredict penalties - all with minimal code size increase. Compared to traditional data prefetching techniques, these methods (i) do not require loop unrolling, (ii) do not require predicate computations and (iii) require fewer machine resources. One of these methods has been implemented in the Intel Production Compiler for the Itanium TM processor. This technique is compared with traditional approaches for software prefetching and experimental results are presented based on the floating-point benchmark suite of CPU2000.