ATUM: a new technique for capturing address traces using microcode
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Hierarchical cache/bus architecture for shared memory multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Analysis of cache performance for operating systems and multiprogramming
Analysis of cache performance for operating systems and multiprogramming
A simulation study of two-level caches
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches
Computer
ACM Computing Surveys (CSUR)
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Performance directed memory hierarchy design
Performance directed memory hierarchy design
Analytical modelling of a hierarchical buffer for a data sharing environment
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Cache replacement with dynamic exclusion
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Optimal Partitioning of Cache Memory
IEEE Transactions on Computers
Designing the TFP Microprocessor
IEEE Micro
Optimal allocation of on-chip memory for multiple-API operating systems
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A trace-driven simulation methodology
ACM SIGARCH Computer Architecture News
Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Instruction fetching: coping with code bloat
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving cache performance with balanced tag and data paths
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
Trace-driven simulations for a two-level cache design in open bus systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The TLB slice—a low-cost high-speed address translation mechanism
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Facilitating level three cache studies using set sampling
Proceedings of the 32nd conference on Winter simulation
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
Cache miss behavior: is it √2?
Proceedings of the 3rd conference on Computing frontiers
On multi-level exclusive caching: offline optimality and why promotions are better than demotions
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
A consistency architecture for hierarchical shared caches
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Proceedings of the 7th ACM international conference on Computing frontiers
Co-optimization of memory access and task scheduling on MPSoC architectures with multi-level memory
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Exploring latency-power tradeoffs in deep nonvolatile memory hierarchies
Proceedings of the 9th conference on Computing Frontiers
Minimizing accumulative memory load cost on multi-core DSPs with multi-level memory
Journal of Systems Architecture: the EUROMICRO Journal
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.01 |
The increasing speed of new generation processors will exacerbate the already large difference between CPU cycle times and main memory access times. As this difference grows, it will be increasingly difficult to build single-level caches that are both fast enough to match these fast cycle times and large enough to effectively hide the slow main memory access times. One solution to this problem is to use a multi-level cache hierarchy. This paper examines the relationship between cache organization and program execution time for multi-level caches. We show that a first-level cache dramatically reduces the number of references seen by a second-level cache, without having a large effect on the number of second-level cache misses. This reduction in the number of second-level cache hits changes the optimal design point by decreasing the importance of the cycle-time of the second-level cache relative to its size. The lower the first-level cache miss rate, the less important the second-level cycle time becomes. This change in relative importance of cycle time and miss rate makes associativity more attractive and increases the optimal cache size for second-level caches over what they would be for an equivalent single-level cache system.