Neighborhood Prefetching on Multiprocessors Using Instruction History

Authors:
David M. Koppelman
Affiliations:
-
Venue:
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Year:
2000

Citing 0
Cited 4

Going the distance for TLB prefetching: an application-driven study

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Effective stream-based and execution-based data prefetching

Proceedings of the 18th annual international conference on Supercomputing
Merging, sorting and matrix operations on the SOME-bus multiprocessor architecture

Future Generation Computer Systems - Special issue: Advanced services for clusters and internet computing
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

A multiprocessor prefetch scheme is described in which a miss is followed by a prefetch of a group of lines, a neighborhood, surrounding the demand-fetched line. The neighborhood is based on the data address and the past behavior of the instruction that missed the cache. A neighborhood for an instruction is constructed by recording the offsets of addresses that subsequently miss. This neighborhood prefetching can exploit sequential access as can sequential prefetch and can to some extent exploit stride access, as can stride prefetch. Unlike stride and sequential prefetch, it can support irregular access patterns. Neighborhood prefetching was compared to adaptive sequential prefetching using execution-driven simulation. Results show prefetches that are more useful and lower execution time for neighborhood prefetching for six of eight SPLASH-2 benchmarks. On eight SPLASH-2 benchmarks, the average normalized execution time is less than 0.9, for three benchmarks, less than 0.8.