Compiler orchestrated prefetching via speculation and predication

Authors:
Rodric M. Rabbah;Hariharan Sandanagobalane;Mongkol Ekpanyapong;Weng-Fai Wong
Affiliations:
Massachusetts Institute of Technology;National University of Singapore;Georgia Institute of Technology;National University of Singapore
Venue:
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Year:
2004

Citing 31
Cited 9

Dynamic program slicing

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Software support for speculative loads

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Stride directed prefetching in scalar processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Cache miss heuristics and preloading techniques for general-purpose programs

Proceedings of the 28th annual international symposium on Microarchitecture
Informing memory operations: providing memory performance feedback in modern processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Predictability of load/store instruction latencies

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Prefetching Using Markov Predictors

IEEE Transactions on Computers - Special issue on cache memory and related problems
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Predictor-directed stream buffers

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Post-pass binary adaptation for software-based speculative precomputation

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
DELI: a new run-time control point

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A Compiler-Assisted Data Prefetch Controller

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Informing Loads: Enabling Software to Observe and React to Memory Behavior

Informing Loads: Enabling Software to Observe and React to Memory Behavior
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Static Identification of Delinquent Loads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A cost-driven compilation framework for speculative parallelization of sequential programs

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters

A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework

Proceedings of the International Symposium on Code Generation and Optimization
Dynamic memory optimization using pool allocation and prefetching

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Relative factors in performance analysis of Java virtual machines

Proceedings of the 2nd international conference on Virtual execution environments
Phase-based adaptive recompilation in a JVM

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Data access history cache and associated data prefetching mechanisms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A compiler-directed data prefetching scheme for chip multiprocessors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A reuse-aware prefetching scheme for scratchpad memory

Proceedings of the 48th Design Automation Conference
Trimaran: an infrastructure for research in instruction-level parallelism

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Targeted data prefetching

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a compiler orchestrated prefetching system as a unified framework geared toward ameliorating the gap between processing speeds and memory access latencies. We focus the scope of the optimization on specific subsets of the program dependence graph that succinctly characterize the memory access pattern of both regular array-based applications and irregular pointer-intensive programs. We illustrate how program embedded precomputation via speculative execution can accurately predict and effectively prefetch future memory references with negligible overhead. The proposed techniques reduce the total running time of seven SPEC benchmarks and two OLDEN benchmarks by 27% on an Itanium 2 processor. The improvements are in addition to several state-of-the-art optimizations including software pipelining and data prefetching. In addition, we use cycle-accurate simulations to identify important and lightweight architectural innovations that further mitigate the memory system bottleneck. In particular, we focus on the notoriously challenging class of pointer-chasing applications, and demonstrate how they may benefit from a novel scheme of it sentineled prefetching. Our results for twelve SPEC benchmarks demonstrate that 45% of the processor stalls that are caused by the memory system are avoidable. The techniques in this paper can effectively mask long memory latencies with little instruction overhead, and can readily contribute to the performance of processors today.