Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Prefetching in supercomputer instruction caches
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Efficient detection of all pointer and array access errors
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Memory-system design considerations for dynamically-scheduled processors
Proceedings of the 24th annual international symposium on Computer architecture
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
A stateless, content-directed data prefetching mechanism
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Stride prefetching by dynamically inspecting objects
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Quantifying Load Stream Behavior
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction
IEEE Transactions on Computers
Data Cache Prefetching Using a Global History Buffer
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Memory Prefetching Using Adaptive Stream Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Performance driven data cache prefetching in a dynamic software optimization system
Proceedings of the 21st annual international conference on Supercomputing
A coarse-grained stream architecture for cryo-electron microscopy images 3D reconstruction
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Hi-index | 0.00 |
CPU speeds have increased faster than the rate of improvement in memory access latencies in the recent past. As a result, with programs that suffer excessive cache misses, the CPU will increasingly be stalled waiting for the memory system to provide the requested memory line. Prefetching is a latency hiding technique that tackles this problem. If the address of the memory line that misses in cache can be predicted sufficiently in advance, it can be prefetched into the cache before it is accessed, reducing the effective latency of that access. In this paper, we develop a novel software-only data prefetching scheme that works at the instruction level and exploits predictability in the access stream to prefetch memory lines accessed in the future. Working at the instruction level gives us a global view of memory access patterns across function, module and library boundaries. Conceptually, our scheme monitors the memory locations being accessed by loads and stores as well as their contents. It tries to find instances of predictability such that the address of a load miss can be pre-determined from a limited number of past accesses. We make the following contributions in this work. First, we present a novel prefetching strategy that unifies and generalizes a number of past approaches that each target a specific source of address predictability. Specifically, our scheme unifies all these past approaches: next-line prefetching, self-stride prefetching, "intraiteration" stride prefetching and same-object prefetching. In addition, it extends and generalizes the SPAID scheme for pointer and call-intensive programs. Second, we present a new threshold-based approach that addresses the issues of prefetch accuracy, prefetch timeliness and prefetch redundancy. Third, we assess our scheme both with a cache simulator and on a real machine where we evaluate it with hardware performance counters. Overall, we demonstrate that a significant reduction in L1 cache misses can be achieved for several benchmarks on a real machine with our approach.