Value-Profile Guided Stride Prefetching for Irregular Code

Authors:
Youfeng Wu;Mauricio J. Serrano;Rakesh Krishnaiyer;Wei Li;Jesse Fang
Affiliations:
-;-;-;-;-
Venue:
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Year:
2002

Citing 21
Cited 5

Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Optimally profiling and tracing programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Evaluating stream buffers as a secondary cache replacement

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Memory-system design considerations for dynamically-scheduled processors

Proceedings of the 24th annual international symposium on Computer architecture
Data prefetching on the HP PA-8000

Proceedings of the 24th annual international symposium on Computer architecture
Value profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Effective jump-pointer prefetching for linked data structures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Predictor-directed stream buffers

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
When Caches Aren't Enough: Data Prefetching Techniques

Computer
Optimizing Software Data Prefetches with Rotating Registers

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Compiling for the Impulse Memory Controller

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Speculative Prefetching of Induction Pointers

CC '01 Proceedings of the 10th International Conference on Compiler Construction

Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Profile-guided post-link stride prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
Stride prefetching by dynamically inspecting objects

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Prefetch injection based on hardware monitoring and object metadata

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Automatically generating symbolic prefetches for distributed transactional memories

Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory operations in irregular code are difficult to prefetch, as the future address of a memory location is hard to anticipate by a compiler. However, recent studies as well as our experience indicate that many irregular programs contain loads with near-constant strides. This paper presents a novel compiler technique to profile and prefetch for those loads. The profile captures not only the dominant stride values for each profiled load, but also the differences between the successive strides of the load. The profile information helps the compiler to classify load instructions into strongly or weakly strided and single-strided or phased multi-strided. The prefetching decisions guided by the load classifications are highly selective and beneficial. We obtain significant performance improvement for the CPU2000 integer programs running on Itanium驴 machines. For example, we achieve a 1.55x speedup for "181.mcf", 1.15x for "254.gap", 1.08x for "197.parser" and smaller gains in other benchmarks. We also show that the performance gain is stable across profile data sets and that the profiling overhead is low. These benefits make the new technique suitable for a production compiler.