Hardware Support for Prescient Instruction Prefetch

Authors:
Tor M. Aamodt;Paul Chow;Per Hammarlund;Hong Wang;John P. Shen
Affiliations:
University of Toronto;University of Toronto;Intel Corporation;Intel Corporation;Intel Corporation
Venue:
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Year:
2004

Citing 0
Cited 6

Fighting the memory wall with assisted execution

Proceedings of the 1st conference on Computing frontiers
Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Control Flow Optimization Via Dynamic Reconvergence Prediction

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Analyzing the worst-case execution time for instruction caches with prefetching

ACM Transactions on Embedded Computing Systems (TECS)
Inter-core prefetching for multicore processors using migrating helper threads

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
SHIFT: shared history instruction fetch for lean-core server processors

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes and evaluates hardware mechanisms for supporting prescient instruction prefetch 驴 an approach to improving single-threaded application performance by using helper threads to perform instruction prefetch. We demonstrate the need for enabling store-to-load communication and selective instruction execution when directly pre-executing future regions of an application that suffer I-cache misses. Two novel hardware mechanisms, safe-store and YAT-bits, are introduced that help satisfy these requirements. This paper also proposes and evaluates .nite state machine recall, a technique for limiting pre-execution to branches that are hard to predict by leveraging a counted I-prefetch mechanism. On a research Itanium®SMT processor with next line and streaming I-prefetch mechanisms that incurs latencies representative of next generation processors, prescient instruction prefetch can improve performance by an average of 10.0% to 22% on a set of SPEC 2000 benchmarks that suffer significant I-cache misses. Prescient instruction prefetch is found to be competitive against even the most aggressive research hardware instruction prefetch technique: fetch directed instruction prefetch.