Data Cache Prefetching Using a Global History Buffer

Authors:
Kyle J. Nesbit;James E. Smith
Affiliations:
University of Wisconsin - Madison;University of Wisconsin - Madison
Venue:
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Year:
2004

Citing 0
Cited 55

Effective stream-based and execution-based data prefetching

Proceedings of the 18th annual international conference on Supercomputing
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Temporal Streaming of Shared Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Store-Ordered Streaming of Shared Memory

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Spectral prefetcher: An effective mechanism for L2 cache prefetching

ACM Transactions on Architecture and Code Optimization (TACO)
Spatial Memory Streaming

Proceedings of the 33rd annual international symposium on Computer Architecture
Efficient emulation of hardware prefetchers via event-driven helper threading

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Data prefetching in a cache hierarchy with high bandwidth and capacity

MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Stealth prefetching

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Reducing Cache Pollution via Dynamic Data Prefetch Filtering

IEEE Transactions on Computers
Memory Prefetching Using Adaptive Stream Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Analysis of hardware prefetching across virtual page boundaries

Proceedings of the 4th international conference on Computing frontiers
Data prefetching in a cache hierarchy with high bandwidth and capacity

ACM SIGARCH Computer Architecture News
Predictor virtualization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Data access history cache and associated data prefetching mechanisms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Focused prefetching: performance oriented prefetching based on commit stalls

Proceedings of the 22nd annual international conference on Supercomputing
Combating I-O bottleneck using prefetching: model, algorithms, and ramifications

The Journal of Supercomputing
Extending CC-NUMA systems to support write update optimizations

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Low-Cost Adaptive Data Prefetching

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Revisiting Cache Block Superloading

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
PFetch: software prefetching exploiting temporal predictability of memory access streams

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Temporal instruction fetch streaming

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Access map pattern matching for data cache prefetch

Proceedings of the 23rd international conference on Supercomputing
Spatio-temporal memory streaming

Proceedings of the 36th annual international symposium on Computer architecture
Stream chaining: exploiting multiple levels of correlation in data prefetching

Proceedings of the 36th annual international symposium on Computer architecture
MPTLsim: a cycle-accurate, full-system simulator for x86-64 multicore architectures with coherent caches

ACM SIGARCH Computer Architecture News
COMPASS: a programmable data prefetcher using idle GPU shaders

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

ACM Transactions on Architecture and Code Optimization (TACO)
Extending data prefetching to cope with context switch misses

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Timing local streams: improving timeliness in data prefetching

Proceedings of the 24th ACM International Conference on Supercomputing
NTPT: on the end-to-end traffic prediction in the on-chip networks

Proceedings of the 47th Design Automation Conference
An Adaptive Data Prefetcher for High-Performance Processors

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Coterminous locality and coterminous group data prefetching on chip-multiprocessors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Extended histories: improving regularity and performance in correlation prefetchers

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Template-based memory access engine for accelerators in SoCs

Proceedings of the 16th Asia and South Pacific Design Automation Conference
Global-aware and multi-order context-based prefetching for high-performance processors

International Journal of High Performance Computing Applications
Algorithmic ramifications of prefetching in memory hierarchy

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Dynamic co-allocation of level one caches

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
When Prefetching Works, When It Doesn’t, and Why

ACM Transactions on Architecture and Code Optimization (TACO)
Proactive instruction fetch

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Multi-level hardware prefetching using low complexity delta correlating prediction tables with partial matching

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Unified memory optimizing architecture: memory subsystem control with a unified predictor

Proceedings of the 26th ACM international conference on Supercomputing
Boosting mobile GPU performance with a decoupled access/execute fragment processor

Proceedings of the 39th Annual International Symposium on Computer Architecture
Application data prefetching on the IBM blue gene/Q supercomputer

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Algorithm-level Feedback-controlled Adaptive data prefetcher: Accelerating data access for high-performance processors

Parallel Computing
OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Orchestrated scheduling and prefetching for GPGPUs

Proceedings of the 40th Annual International Symposium on Computer Architecture
Application-driven end-to-end traffic predictions for low power NoC design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
S/DC: a storage and energy efficient data prefetcher

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Network-aware data caching and prefetching for cloud-hosted metadata retrieval

NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
SHIFT: shared history instruction fetch for lean-core server processors

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The case of using multiple streams in streaming

International Journal of Automation and Computing
Practical models for energy-efficient prefetching in mobile embedded systems

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new structure for implementing data cache prefetching is proposed and analyzed via simulation. The structure is based on a Global History Buffer that holds the most recent miss addresses in FIFO order. Linked lists within this global history buffer connect addresses that have some common property, e.g. they were all generated by the same load instruction. The Global History Buffer can be used for implementing a number of previously proposed prefetch methods, as well as new ones. Prefetching with the Global History Buffer has two significant advantages over conventional table prefetching methods. First, the use of a FIFO history buffer can improve the accuracy of correlation prefetching by eliminating stale data from the table. Second, the Global History Buffer contains a more complete (and intact) picture of cache miss history, creating opportunities to design more effective prefetching methods. Global History Buffer prefetching can increase correlation prefetching performance by 20% and cut its memory traffic by 90%. Furthermore, the Global History Buffer can make correlations within a loadýs address stream, which can increase stride prefetching performance by 6%. Collectively, the Global History Buffer prefetching methods perform as well or better than the conventional prefetching methods studied on 14 of 15 benchmarks.