An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Dead-block prediction & dead-block correlating prefetchers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Using a user-level memory thread for correlation prefetching
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
TCP: Tag Correlating Prefetchers
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Branch History Guided Instruction Prefetching
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
AC/DC: An Adaptive Data Cache Prefetcher
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Temporal Streaming of Shared Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Data Cache Prefetching Using a Global History Buffer
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework
Proceedings of the International Symposium on Code Generation and Optimization
Proceedings of the 33rd annual international symposium on Computer Architecture
BioBench: A Benchmark Suite of Bioinformatics Applications
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
IBM Journal of Research and Development
Spatio-temporal memory streaming
Proceedings of the 36th annual international symposium on Computer architecture
Spatio-temporal memory streaming
Proceedings of the 36th annual international symposium on Computer architecture
Timing local streams: improving timeliness in data prefetching
Proceedings of the 24th ACM International Conference on Supercomputing
Global-aware and multi-order context-based prefetching for high-performance processors
International Journal of High Performance Computing Applications
Unified memory optimizing architecture: memory subsystem control with a unified predictor
Proceedings of the 26th ACM international conference on Supercomputing
Prefetching and cache management using task lifetimes
Proceedings of the 27th international ACM conference on International conference on supercomputing
S/DC: a storage and energy efficient data prefetcher
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
A Prefetching Scheme Exploiting both Data Layout and Access History on Disk
ACM Transactions on Storage (TOS)
APOGEE: adaptive prefetching on GPUs for energy efficiency
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Linearizing irregular memory accesses for improved correlated prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The case of using multiple streams in streaming
International Journal of Automation and Computing
Hi-index | 0.00 |
Data prefetching has long been an important technique to amortize the effects of the memory wall, and is likely to remain so in the current era of multi-core systems. Most prefetchers operate by identifying patterns and correlations in the miss address stream. Separating streams according to the memory access instruction that generates the misses is an effective way of filtering out spurious addresses from predictable streams. On the other hand, by localizing streams based on the memory access instructions, such prefetchers both lose the complete time sequence information of misses and can only issue prefetches for a single memory access instruction at a time. This paper proposes a novel class of prefetchers based on the idea of linking various localized streams into predictable chains of missing memory access instructions such that the prefetcher can issue prefetches along multiple streams. In this way the prefetcher is not limited to prefetching deeply for a single missing memory access instruction but can instead adaptively prefetch for other memory access instructions closer in time. Experimental results show that the proposed prefetcher consistently achieves better performance than a state-of-the-art prefetcher -- 10% on average, being only outperformed in very few cases and then by only 2%, and outperforming that prefetcher by as much as 55% -- while consuming the same amount of memory bandwidth.