Stealth prefetching

Authors:
Jason F. Cantin;Mikko H. Lipasti;James E. Smith
Affiliations:
International Business Machines Corp.;University of Wisconsin, Madison;University of Wisconsin, Madison
Venue:
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Year:
2006

Citing 12
Cited 9

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The sun fireplane system interconnect

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Simulating a $2M Commercial Server on a $2K PC

Computer
Guided region prefetching: a cooperative hardware/software approach

Proceedings of the 30th annual international symposium on Computer architecture
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Filtering Superfluous Prefetches Using Density Vectors

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Memory-side prefetching for linked data structures for processor-in-memory systems

Journal of Parallel and Distributed Computing
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking

Proceedings of the 32nd annual international symposium on Computer Architecture
Data Cache Prefetching Using a Global History Buffer

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays

IEEE Micro
Spatial Memory Streaming

Proceedings of the 33rd annual international symposium on Computer Architecture

Predictor virtualization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Spatio-temporal memory streaming

Proceedings of the 36th annual international symposium on Computer architecture
Performance balancing: software-based on-chip memory management for effective CMP executions

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Exploring the prefetcher/memory controller design space: an opportunistic prefetch scheduling strategy

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Prefetching and cache management using task lifetimes

Proceedings of the 27th international ACM conference on International conference on supercomputing
The reuse cache: downsizing the shared last-level cache

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
WADE: Writeback-aware dynamic cache management for NVM-based main memory system

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching techniques can mitigate the increasing memory latency, they can harm performance by wasting precious interconnect bandwidth and prematurely accessing shared data, causing state downgrades at remote nodes that force later upgrades.This paper investigates Stealth Prefetching, a new technique that utilizes information from Coarse-Grain Coherence Tracking (CGCT) for prefetching data aggressively, stealthily, and efficiently in a broadcast-based shared-memory multiprocessor system. Stealth Prefetching utilizes CGCT to identify regions of memory that are not shared by other processors, aggressively fetches these lines from DRAM in open-page mode, and moves them close to the processor in anticipation of future references. Our analysis with commercial, scientific, and multiprogrammed workloads show that Stealth Prefetching provides an average speedup of 20% over an aggressive baseline system with conventional prefetching.