ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency
Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Speculative execution via address prediction and data prefetching
ICS '97 Proceedings of the 11th international conference on Supercomputing
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Correlated load-address predictors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data prefetching by dependence graph precomputation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dead-block prediction & dead-block correlating prefetchers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
A Compiler-Assisted Data Prefetch Controller
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Differential FCM: Increasing Value Prediction Accuracy by Improving Table Usage Efficiency
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Static Identification of Delinquent Loads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Compiler orchestrated prefetching via speculation and predication
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
A load-instruction unit for pipelined processors
IBM Journal of Research and Development
Data prefetching strategies for vector cache memories
IPPS '91 Proceedings of the Fifth International Parallel Processing Symposium
Hi-index | 0.00 |
Given the increasing gap between processors and memory, prefetching data into cache becomes an important strategy for preventing the processor from being starved of data. The success of any data prefetching scheme depends on three factors: timeliness, accuracy and overhead. In most hardware prefetching mechanism, the focus has been on accuracy – ensuring that the predicted address do turn out to be demanded in a later part of the code. In this paper, we introduce a simple hardware prefetching mechanism that targets delinquent loads, i.e. loads that account for a large proportion of the load misses in an application. Our results show that our prefetch strategy can reduce up to 45% of stall cycles of benchmarks running on a simulated out-of-order superscalar processor with an overhead of 0.0005 prefetch per CPU cycle.