Increasing hardware data prefetching performance using the second-level cache

Authors:
Nathalie Drach;Jean-Luc Béchennec;Olivier Temam
Affiliations:
LRI, Paris South University, 91405 Orsay Cedex, France;LRI, Paris South University, 91405 Orsay Cedex, France;LRI, Paris South University, 91405 Orsay Cedex, France
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2002

Citing 19
Cited 0

Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Speculative prefetching

ICS '93 Proceedings of the 7th international conference on Supercomputing
Hardware implementation issues of data prefetching

ICS '95 Proceedings of the 9th international conference on Supercomputing
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Profetching and memory system behavior of the SPEC95 benchmark suite

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Hardware identification of cache conflict misses

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
Data prefetch mechanisms

ACM Computing Surveys (CSUR)
Predictor-directed stream buffers

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Superscalar Instruction Execution in the 21164 Alpha Microprocessor

IEEE Micro
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
ATOM: a flexible interface for building high performance program analysis tools

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
ASF: a teaching and rsearch object-oriented simulation tool for computer architecture design and performance evaluation

WCAE '98 Proceedings of the 1998 workshop on Computer architecture education

Quantified Score

Hi-index	0.00

Visualization

Abstract

Techniques to reduce or tolerate large memory latencies are critical for achieving high processor performance. Hardware data prefetching is one of the most heavily studied solutions, but it is essentially applied to first-level caches where it can severely disrupt processor behavior by delaying normal cache requests, inducing cache pollution and occupying the heavily used bus to the second-level cache. In this article, we show that applying hardware data prefetching to the second level cache exhibits most of the benefits of first-level cache prefetching with almost none of its drawbacks. Moreover, we outline that second-level hardware data prefetching is particularly well suited to out-of-order (OoO) processors because it can hide the long memory latencies due to second-level cache misses while OoO execution of memory instructions can hide the lower latencies due to first-level cache misses that hit in the second-level cache. Finally, we show that when the full memory system is taken into account, especially bus traffic, first-level cache prefetching can actually degrade overall processor performance while second-level cache prefetching consistently improves overall performance. Our experimental results show that the instructions per cycle of floating-point programs (SPEC95) increases by 20% on a average using second-level cache hardware data prefetching while it decreases by 5% on a average using first-level cache hardware data prefetching.