When Caches Aren't Enough: Data Prefetching Techniques

Authors:
Steven P. VanderWiel;David J. Lilja
Affiliations:
-;-
Venue:
Computer
Year:
1997

Citing 10
Cited 18

Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Compiler techniques for data prefetching on the PowerPC

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The performance impact of block sizes and fetch strategies

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Advanced performance features of the 64-bit PA-8000

COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture

Characterization and improvement of load/store cache-based prefetching

ICS '98 Proceedings of the 12th international conference on Supercomputing
Data prefetching for software DSMs

ICS '98 Proceedings of the 12th international conference on Supercomputing
Data prefetch mechanisms

ACM Computing Surveys (CSUR)
High performance software on Intel Pentium Pro processors or Micro-Ops to TeraFLOPS

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Scientific computing on the Itanium™ processor

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Smarter Memory: Improving Bandwidth for Streamed References

Computer
Accelerating Code Deployment on Active Networks

DSOM '99 Proceedings of the 10th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management: Active Technologies for Network and Service Management
Experiments with Sequential Prefetching

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Value-Profile Guided Stride Prefetching for Irregular Code

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Xtream-Fit: an energy-delay efficient data memory subsystem for embedded media processing

Proceedings of the 40th annual Design Automation Conference
Merging, sorting and matrix operations on the SOME-bus multiprocessor architecture

Future Generation Computer Systems - Special issue: Advanced services for clusters and internet computing
Cache Refill/Access Decoupling for Vector Machines

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Scientific computing on the Itanium® processor

Scientific Programming - Best papers from SC 2001
Data access history cache and associated data prefetching mechanisms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
An Adaptive Data Prefetcher for High-Performance Processors

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Global-aware and multi-order context-based prefetching for high-performance processors

International Journal of High Performance Computing Applications
Algorithm-level Feedback-controlled Adaptive data prefetcher: Accelerating data access for high-performance processors

Parallel Computing
Network-aware data caching and prefetching for cloud-hosted metadata retrieval

NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management

Quantified Score

Hi-index	4.10

Visualization

Abstract

For the past few years, CPU performance has outpaced that of dynamic RAM, the primary component of main memory. Developers have had to use increasingly aggressive techniques to reduce or hide delays in accessing main memory. Even so, it is still not uncommon for scientific programs to spend more than half their runtimes stalled on memory requests. This poor performance is partially a result of the policies used to fetch data from main memory: Processors typically request data only when it is needed and then only if it is not first found in the cache. In contrast, data prefetching calls data into the cache before the processor needs it. Ideally, prefetching completes just in time for the processor to access the needed data. Prefetching can nearly double the performance of some scientific applications running on commercial systems. But to achieve this performance, it is critical that the most suitable prefetching technique is used. This article reviews three popular prefetching techniques and examines in which situations they are best used.