ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Data prefetching for high-performance processors
Data prefetching for high-performance processors
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
ACM Computing Surveys (CSUR)
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Reducing data access penalty using intelligent opcode-driven cache prefetching
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Hi-index | 0.00 |
Data prefetching has been proven to be effective in hiding the memory latency from the pl-ogram execution time. Most current data prefetching schemes target only for array references with constant strides. For array references with non-constant strides, they lose most of their effectiveness. In this paper, we propose a novel data prefetching scheme based on a property, called the Self-Containness of Variables, which is widely available in most loop-rich applications. We observed that the update pattern of a self-contained variable in a loop can be accurately predicted. The predicated value can then be used for accurate data prefetching if the variable is the only loop-variant component of an address expression in a memory access instruction. With suitable hardware support, this scheme can be used to prefetch data from recursive data slrucrures in additional to array elements. Moreover, the coverage of this scheme is highly selectable. It can be customized easily to fit the cost-performance requirements of different processors that are designed for different applications.