Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors

Authors:
F. Dahlgren;P. Stenstrom
Affiliations:
-;-
Venue:
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Year:
1995

Citing 14
Cited 15

A Survey of Cache Coherence Schemes for Multiprocessors

Computer
Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Data prefetching in multiprocessor vector cache memories

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Prefetch unit for vector operations on scalar computers

ACM SIGARCH Computer Architecture News
Combined performance gains of simple cache protocol extensions

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Sequential Program Prefetching in Memory Hierarchies

Computer
A New Solution to Coherence Problems in Multicache Systems

IEEE Transactions on Computers

Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
An evaluation of memory consistency models for shared-memory systems with ILP processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Hiding communication latency and coherence overhead in software DSMs

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Examination of a memory access classification scheme for pointer-intensive and numeric programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
Characterization and improvement of load/store cache-based prefetching

ICS '98 Proceedings of the 12th international conference on Supercomputing
On Interaction between Interconnection Network Design and Latency Hiding Techniques in Multiprocessors

The Journal of Supercomputing
Data prefetch mechanisms

ACM Computing Surveys (CSUR)
When Caches Aren't Enough: Data Prefetching Techniques

Computer
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Guided region prefetching: a cooperative hardware/software approach

Proceedings of the 30th annual international symposium on Computer architecture
Characterization and Evaluation of Cache Hierarchies for Web Servers

World Wide Web
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The Impact of Incorrectly Speculated Memory Operations in a Multithreaded Architecture

IEEE Transactions on Parallel and Distributed Systems
Addressing mode driven low power data caches for embedded processors

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the relative efficiency of previously proposed stride and sequential prefetching-two promising hardware-based prefetching schemes to reduce read-miss penalties in shared-memory multiprocessors. Although stride accesses dominate in four out of six of the applications we study, we find that sequential prefetching does better than stride prefetching for three applications. This is because (i) most strides are shorter than the block size (we assume 32 byte blocks), which means that sequential prefetching is as effective for stride accesses, and (ii) sequential prefetching also exploits the locality of read misses for non-stride accesses. However we find that since stride prefetching causes fewer useless prefetches, it consumes less memory-system bandwidth.