Design Considerations of High Performance Data Cache with Prefetching

Authors:
Chi-Hung Chi;Jun-Li Yuan
Affiliations:
-;-
Venue:
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Year:
1999

Citing 10
Cited 0

Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Data prefetching for high-performance processors

Data prefetching for high-performance processors
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Speculative execution via address prediction and data prefetching

ICS '97 Proceedings of the 11th international conference on Supercomputing
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Load execution latency reduction

ICS '98 Proceedings of the 12th international conference on Supercomputing
Characterization and improvement of load/store cache-based prefetching

ICS '98 Proceedings of the 12th international conference on Supercomputing
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
A New Voting Based Hardware Data Prefetch Scheme

HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a set of four load-balancing techniques to address the memory latency problem of on-chip cache. The first two mechanisms, the sequential unification and the aggressive lookahead mechanisms, are mainly used to reduce the chance of partial hits and the abortion of accurate prefetch requests. The latter two mechanisms, the default prefetching and the cache partitioning mechanisms, are used to optimize the cache performance of the unpredictable references. The resulting cache, called the LBD (Load-Balancing Data) cache, is found to have superior performance over a wide range of applications. Simulation of the LBD cache with RPT prefetching (Reference Prediction Table - one of the most cited selective data prefetch schemes [2,3]) on SPEC95 showed that significant reduction in the data reference latency, ranging from about 20% to over 90% and with an average of 55.89%, can be obtained. This is compared against the performance of prefetch-on-miss and RPT, with an average latency reduction of only 17.37% and 26.05% respectively.