Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Compiler techniques for data prefetching on the PowerPC
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The performance impact of block sizes and fetch strategies
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Advanced performance features of the 64-bit PA-8000
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Characterization and improvement of load/store cache-based prefetching
ICS '98 Proceedings of the 12th international conference on Supercomputing
Data prefetching for software DSMs
ICS '98 Proceedings of the 12th international conference on Supercomputing
ACM Computing Surveys (CSUR)
High performance software on Intel Pentium Pro processors or Micro-Ops to TeraFLOPS
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Scientific computing on the Itanium™ processor
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Accelerating Code Deployment on Active Networks
DSOM '99 Proceedings of the 10th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management: Active Technologies for Network and Service Management
Experiments with Sequential Prefetching
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Value-Profile Guided Stride Prefetching for Irregular Code
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Xtream-Fit: an energy-delay efficient data memory subsystem for embedded media processing
Proceedings of the 40th annual Design Automation Conference
Merging, sorting and matrix operations on the SOME-bus multiprocessor architecture
Future Generation Computer Systems - Special issue: Advanced services for clusters and internet computing
Cache Refill/Access Decoupling for Vector Machines
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Scientific computing on the Itanium® processor
Scientific Programming - Best papers from SC 2001
Data access history cache and associated data prefetching mechanisms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
An Adaptive Data Prefetcher for High-Performance Processors
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Global-aware and multi-order context-based prefetching for high-performance processors
International Journal of High Performance Computing Applications
Network-aware data caching and prefetching for cloud-hosted metadata retrieval
NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
Hi-index | 4.10 |
For the past few years, CPU performance has outpaced that of dynamic RAM, the primary component of main memory. Developers have had to use increasingly aggressive techniques to reduce or hide delays in accessing main memory. Even so, it is still not uncommon for scientific programs to spend more than half their runtimes stalled on memory requests. This poor performance is partially a result of the policies used to fetch data from main memory: Processors typically request data only when it is needed and then only if it is not first found in the cache. In contrast, data prefetching calls data into the cache before the processor needs it. Ideally, prefetching completes just in time for the processor to access the needed data. Prefetching can nearly double the performance of some scientific applications running on commercial systems. But to achieve this performance, it is critical that the most suitable prefetching technique is used. This article reviews three popular prefetching techniques and examines in which situations they are best used.