Cost-Effective Compiler Directed Memory Prefetching and Bypassing

Authors:
Daniel Ortega;Eduard Ayguadé;Jean-Loup Baer;Mateo Valero
Affiliations:
-;-;-;-
Venue:
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Year:
2002

Citing 11
Cited 5

Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Sentinel scheduling for VLIW and superscalar processors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Tango: a hardware-based data prefetching technique for superscalar processors

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Classifying load and store instructions for memory renaming

ICS '99 Proceedings of the 13th international conference on Supercomputing
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A novel renaming mechanism that boosts software prefetching

ICS '01 Proceedings of the 15th international conference on Supercomputing
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers

A first glance at Kilo-instruction based multiprocessors

Proceedings of the 1st conference on Computing frontiers
Merging, sorting and matrix operations on the SOME-bus multiprocessor architecture

Future Generation Computer Systems - Special issue: Advanced services for clusters and internet computing
Exploring the limits of prefetching

IBM Journal of Research and Development - Electrochemical technology in microelectronics
Dynamic memory instruction bypassing

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Meeting midway: improving CMP performance with memory-side prefetching

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ever increasing memory latencies and deeper pipelines push memory farther from the processor. Prefetching techniques aim is to bridge these two gaps by fetching data in advance to both the L1 cache and the register file. Our main contribution in this paper is a hybrid approach to the prefetching problem that combines both software and hardware prefetching in a cost-effective way by needing very little hardware support and impacting minimally the design of the processor pipeline. The prefetcher is built on-top of a static memory instruction bypassing, which is in charge of bringing prefetched values in the register file. In this paper we also present a thorough analysis of the limits of both prefetching and memory instruction bypassing. We also compare our prefetching technique with a prior speculative proposal that attacked the same problem, and we show that at much lower cost, our hybrid solution is better than a realistic implementation of speculative prefetching and bypassing. In average, our hybrid implementation achieves a 13% speed-up improvement over a version with software prefetching in a subset of numerical applications and an average of 43% over aversion with no software prefetching (achieving up to a 102% for specific benchmarks).