Line (block) size choice for CPU cache memories
IEEE Transactions on Computers
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
MIPS RISC architectures
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Prefetch unit for vector operations on scalar computers
ACM SIGARCH Computer Architecture News
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Characterizing the behavior of sparse algorithms on caches
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Using virtual lines to enhance locality exploitation
ICS '94 Proceedings of the 8th international conference on Supercomputing
Hardware implementation issues of data prefetching
ICS '95 Proceedings of the 9th international conference on Supercomputing
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
Examination of a memory access classification scheme for pointer-intensive and numeric programs
ICS '96 Proceedings of the 10th international conference on Supercomputing
Speculative execution via address prediction and data prefetching
ICS '97 Proceedings of the 11th international conference on Supercomputing
Characterization and improvement of load/store cache-based prefetching
ICS '98 Proceedings of the 12th international conference on Supercomputing
An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors
International Journal of Parallel Programming
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Increasing hardware data prefetching performance using the second-level cache
Journal of Systems Architecture: the EUROMICRO Journal
Stride-directed Prefetching for Secondary Caches
ICPP '97 Proceedings of the international Conference on Parallel Processing
Approximating the optimal replacement algorithm
Proceedings of the 1st conference on Computing frontiers
Program Counter-Based Prediction Techniques for Dynamic Power Management
IEEE Transactions on Computers
Program-counter-based pattern classification in buffer caching
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Hi-index | 0.00 |
A hardware prefetching mechanism named Speculative Prefetching is proposed. This scheme detects vector accesses issued by a load/store instruction and prefetches the corresponding data. The scheme requires no software add-on, and in some cases it is more powerful than software techniques for identifying regular accesses. The tradeoffs related to its hardware implementation are extensively discussed in order to finely tune the mechanism. Experiments show that average memory access time of regular codes is brought within 10% of optimum for processors with usual issue rates, while performance of irregular codes is little reduced though never degraded. The scheme performance is discussed over a wide range of parameters.