Characteristics of performance-optimal multi-level cache hierarchies

Authors:
S. Przybylski;M. Horowitz;J. Hennessy
Affiliations:
Computer Systems Laboratory, Stanford University, Stanford University, CA;Computer Systems Laboratory, Stanford University, Stanford University, CA;Computer Systems Laboratory, Stanford University, Stanford University, CA
Venue:
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Year:
1989

Citing 9
Cited 23

ATUM: a new technique for capturing address traces using microcode

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Hierarchical cache/bus architecture for shared memory multiprocessors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Analysis of cache performance for operating systems and multiprogramming

Analysis of cache performance for operating systems and multiprogramming
A simulation study of two-level caches

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Performance tradeoffs in cache design

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches

Computer
Cache Memories

ACM Computing Surveys (CSUR)
Aspects of cache memory and instruction buffer performance

Aspects of cache memory and instruction buffer performance
Performance directed memory hierarchy design

Performance directed memory hierarchy design

Analytical modelling of a hierarchical buffer for a data sharing environment

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Cache replacement with dynamic exclusion

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Optimal Partitioning of Cache Memory

IEEE Transactions on Computers
Designing the TFP Microprocessor

IEEE Micro
Optimal allocation of on-chip memory for multiple-API operating systems

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A trace-driven simulation methodology

ACM SIGARCH Computer Architecture News
Cache design trade-offs for power and performance optimization: a case study

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Instruction fetching: coping with code bloat

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving cache performance with balanced tag and data paths

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories

IEEE Transactions on Computers - Special issue on cache memory and related problems
Trace-driven simulations for a two-level cache design in open bus systems

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The TLB slice—a low-cost high-speed address translation mechanism

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Facilitating level three cache studies using set sampling

Proceedings of the 32nd conference on Winter simulation
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches

IEEE Transactions on Computers
Cache miss behavior: is it √2?

Proceedings of the 3rd conference on Computing frontiers
On multi-level exclusive caching: offline optimality and why promotions are better than demotions

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
A consistency architecture for hierarchical shared caches

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
NCID: a non-inclusive cache, inclusive directory architecture for flexible and efficient cache hierarchies

Proceedings of the 7th ACM international conference on Computing frontiers
Co-optimization of memory access and task scheduling on MPSoC architectures with multi-level memory

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Exploring latency-power tradeoffs in deep nonvolatile memory hierarchies

Proceedings of the 9th conference on Computing Frontiers
Minimizing accumulative memory load cost on multi-core DSPs with multi-level memory

Journal of Systems Architecture: the EUROMICRO Journal
The case for a scalable coherence protocol for complex on-chip cache hierarchies in many core systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.01

Visualization

Abstract

The increasing speed of new generation processors will exacerbate the already large difference between CPU cycle times and main memory access times. As this difference grows, it will be increasingly difficult to build single-level caches that are both fast enough to match these fast cycle times and large enough to effectively hide the slow main memory access times. One solution to this problem is to use a multi-level cache hierarchy. This paper examines the relationship between cache organization and program execution time for multi-level caches. We show that a first-level cache dramatically reduces the number of references seen by a second-level cache, without having a large effect on the number of second-level cache misses. This reduction in the number of second-level cache hits changes the optimal design point by decreasing the importance of the cycle-time of the second-level cache relative to its size. The lower the first-level cache miss rate, the less important the second-level cycle time becomes. This change in relative importance of cycle time and miss rate makes associativity more attractive and increases the optimal cache size for second-level caches over what they would be for an equivalent single-level cache system.