Multilevel Optimization of Pipelined Caches

Authors:
Kunle Olukotun;Trevor N. Mudge;Richard B. Brown
Affiliations:
-;-;-
Venue:
IEEE Transactions on Computers
Year:
1997

Citing 11
Cited 1

Reducing the Branch Penalty in Pipelined Processors

Computer
Comparing software and hardware schemes for reducing the cost of branches

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Cache and memory hierarchy design: a performance-directed approach

Cache and memory hierarchy design: a performance-directed approach
Implementing a cache for a high-performance GaAs microprocessor

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
MIPS RISC architectures

MIPS RISC architectures
Computer Technology and Architecture: An Evolving Interaction

Computer
Performance optimization of pipelined primary cache

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Technology-organization tradeoffs in the architecture of a high performance processor

Technology-organization tradeoffs in the architecture of a high performance processor
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Aspects of cache memory and instruction buffer performance

Aspects of cache memory and instruction buffer performance

On reducing load/store latencies of cache accesses

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	14.98

Visualization

Abstract

This paper formulates and shows how to solve the problem of selecting the cache size and depth of cache pipelining that maximizes the performance of a given instruction-set architecture. The solution combines trace-driven architectural simulations and the timing analysis of the physical implementation of the cache. Increasing cache size tends to improve performance but this improvement is limited because cache access time increases with its size. This trade-off results in an optimization problem we referred to as multilevel optimization, because it requires the simultaneous consideration of two levels of machine abstraction: the architectural level and the physical implementation level. The introduction of pipelining permits the use of larger caches without increasing their apparent access time, however, the bubbles caused by load and branch delays limit this technique. In this paper we also show how multilevel optimization can be applied to pipelined systems if software- and hardware-based strategies are considered for hiding the branch and load delays.The multilevel optimization technique is illustrated with the design of a pipelined cache for a high clock rate MIPS-based architecture. The results of this design exercise show that, because processors with pipelined caches can have shorter CPU cycle times and larger caches, a significant performance advantage is gained by using two or three pipeline stages to fetch data from the cache. Of course, the results are only optimal for the implementation technologies chosen for the design exercise; other choices could result in quite different optimal designs. The exercise is primarily to illustrate the steps in the design of pipelined caches using multilevel optimization; however, it does exemplify the importance of pipelined caches if high clock rate processors are to achieve high performance.