Performance tradeoffs in cache design

Authors:
S. Prybylski;M. Horowitz;J. Hennessy
Affiliations:
Stanford Univ., Stanford, CA;Stanford Univ., Stanford, CA;Stanford Univ., Stanford, CA
Venue:
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Year:
1988

Citing 0
Cited 49

A Case for Direct-Mapped Caches

Computer
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
The effect of sharing on the cache and bus performance of parallel programs

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Characteristics of performance-optimal multi-level cache hierarchies

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Inexpensive implementations of set-associativity

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Techniques for efficient inline tracing on a shared-memory multiprocessor

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Evolution of Instruction Sequencing

Computer - Special issue on instruction sequencing
Data cache performance of supercomputer applications

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
On the validity of trace-driven simulation for multiprocessors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Cache behavior of combinator graph reduction

ACM Transactions on Programming Languages and Systems (TOPLAS)
Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment

IEEE Transactions on Computers
Cache replacement with dynamic exclusion

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluating performance of prefetching second level caches

ACM SIGMETRICS Performance Evaluation Review
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons

ACM Computing Surveys (CSUR)
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The effectiveness of caches for vector processors

ICS '94 Proceedings of the 8th international conference on Supercomputing
A unified architectural tradeoff methodology

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Cache designs with partial address matching

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A trace-driven simulation methodology

ACM SIGARCH Computer Architecture News
The difference-bit cache

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Increasing cache port efficiency for dynamic superscalar microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Synchronization hardware for networks of workstations: performance vs. cost

ICS '96 Proceedings of the 10th international conference on Supercomputing
Architecture Technique Trade-Offs Using Mean Memory Delay Time

IEEE Transactions on Computers
The selection of optimal cache lines for microprocessor-based controllers

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
A memory management unit and cache controller for the MARS system

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Designing high bandwidth on-chip caches

Proceedings of the 24th annual international symposium on Computer architecture
A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems

IEEE Transactions on Parallel and Distributed Systems
The performance impact of block sizes and fetch strategies

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Trace-driven simulations for a two-level cache design in open bus systems

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks

ACM Transactions on Computer Systems (TOCS)
Cache performance for multimedia applications

ICS '01 Proceedings of the 15th international conference on Supercomputing
Application-adaptive intelligent cache memory system

ACM Transactions on Embedded Computing Systems (TECS)
Reducing Cache Conflicts by Multi-Level Cache Partitioning and Array Elements Mapping

The Journal of Supercomputing
Search space definition and exploration for nonuniform data reuse opportunities in data-dominant applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
The i486 CPU: Executing Instructions in one Clock Cycle

IEEE Micro
False Sharing and Spatial Locality in Multiprocessor Caches

IEEE Transactions on Computers
Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches

IEEE Transactions on Computers
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor

IEEE Transactions on Parallel and Distributed Systems
A Design Frame for Hybrid Access Cashes

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
A Cache Coherency Protocol for Optically Connected Parallel Computer Systems

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The impact of extrinsic cache performance on predictability of real-time systems

RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Cache miss behavior: is it √2?

Proceedings of the 3rd conference on Computing frontiers
Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties

ACM Transactions on Architecture and Code Optimization (TACO)
Performance Issues of a Superscalar Microprocessor

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Partial address directory for cache access

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.03

Visualization

Abstract

Cache memories have become common across a wide range of computer implementations. To date, most analyses of cache performance have concentrated on time independent metrics, such as miss rate and traffic ratio. This paper presents a series of simulations that explore the interactions between various organizational decisions and program execution time. We investigate the tradeoffs between cache size and CPU/Cache cycle time, set associativity and cycle time, and between block size and main memory speed. The results indicate that neither cycle time nor cache size dominates the other across the entire design space. For common implementation technologies, performance is maximized when the size is increased to the 32KB to 128KB range with modest penalties to the cycle time. If set associativity impacts the cycle time by more than a few nanoseconds, it increases overall execution time. Since the block size and memory transfer rate combine to affect the cache miss penalty, the optimum block size is substantially smaller than that which minimizes the miss rate. Finally, the interdependence between optimal cache configuration and the main memory speed necessitates multi-level cache hierarchies for high performance uniprocessors.