Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Dead-block prediction & dead-block correlating prefetchers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
AC/DC: An Adaptive Data Cache Prefetcher
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Temporal Streaming of Shared Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Data Cache Prefetching Using a Global History Buffer
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Proceedings of the 33rd annual international symposium on Computer Architecture
ACM SIGARCH Computer Architecture News
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Data access history cache and associated data prefetching mechanisms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Server-based data push architecture for multi-processor environments
Journal of Computer Science and Technology
Spatio-temporal memory streaming
Proceedings of the 36th annual international symposium on Computer architecture
Stream chaining: exploiting multiple levels of correlation in data prefetching
Proceedings of the 36th annual international symposium on Computer architecture
POWER4 system microarchitecture
IBM Journal of Research and Development
An Adaptive Data Prefetcher for High-Performance Processors
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Hi-index | 0.00 |
Data prefetching technique is widely used to bridge the growing performance gap between processor and memory. Numerous prefetching techniques have been proposed to exploit data patterns and correlations in the miss address stream. In general, the miss addresses are grouped by some common characteristics, such as program counter or memory region they belong to, into localized streams to improve prefetch accuracy and coverage. However, the existing stream localization technique lacks the timing information of misses. This drawback can lead to a large fraction of untimely prefetches, which in turn limits the effectiveness of prefetching, wastes precious bandwidth and leads to high cache pollution potentially. This paper proposes a novel mechanism named stream timing technique that can largely reduce untimely prefetches and in turn increase the overall performance. Based on the proposed stream timing technique, we extend the conventional stride prefetcher and propose a new stride prefetcher called Time-Aware Stride (TAS) prefetcher. We have carried out extensive simulation experiments to verify the design of the stream timing technique and the TAS prefetcher. The simulation results show that the proposed stream timing technique is promising in reducing untimely prefetches and the IPC improvement of TAS prefetcher outperforms the existing stride prefetcher by 11%.