A Case for Direct-Mapped Caches
Computer
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Characteristics of performance-optimal multi-level cache hierarchies
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Inexpensive implementations of set-associativity
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Techniques for efficient inline tracing on a shared-memory multiprocessor
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Evolution of Instruction Sequencing
Computer - Special issue on instruction sequencing
Data cache performance of supercomputer applications
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
On the validity of trace-driven simulation for multiprocessors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Cache behavior of combinator graph reduction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment
IEEE Transactions on Computers
Cache replacement with dynamic exclusion
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluating performance of prefetching second level caches
ACM SIGMETRICS Performance Evaluation Review
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The effectiveness of caches for vector processors
ICS '94 Proceedings of the 8th international conference on Supercomputing
A unified architectural tradeoff methodology
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Cache designs with partial address matching
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A trace-driven simulation methodology
ACM SIGARCH Computer Architecture News
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Increasing cache port efficiency for dynamic superscalar microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Synchronization hardware for networks of workstations: performance vs. cost
ICS '96 Proceedings of the 10th international conference on Supercomputing
Architecture Technique Trade-Offs Using Mean Memory Delay Time
IEEE Transactions on Computers
The selection of optimal cache lines for microprocessor-based controllers
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
A memory management unit and cache controller for the MARS system
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Designing high bandwidth on-chip caches
Proceedings of the 24th annual international symposium on Computer architecture
A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems
IEEE Transactions on Parallel and Distributed Systems
The performance impact of block sizes and fetch strategies
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Trace-driven simulations for a two-level cache design in open bus systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
Cache performance for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Application-adaptive intelligent cache memory system
ACM Transactions on Embedded Computing Systems (TECS)
Reducing Cache Conflicts by Multi-Level Cache Partitioning and Array Elements Mapping
The Journal of Supercomputing
ACM Transactions on Design Automation of Electronic Systems (TODAES)
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches
IEEE Transactions on Computers
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
A Design Frame for Hybrid Access Cashes
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
A Cache Coherency Protocol for Optically Connected Parallel Computer Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The impact of extrinsic cache performance on predictability of real-time systems
RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Cache miss behavior: is it √2?
Proceedings of the 3rd conference on Computing frontiers
ACM Transactions on Architecture and Code Optimization (TACO)
Performance Issues of a Superscalar Microprocessor
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Partial address directory for cache access
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.03 |
Cache memories have become common across a wide range of computer implementations. To date, most analyses of cache performance have concentrated on time independent metrics, such as miss rate and traffic ratio. This paper presents a series of simulations that explore the interactions between various organizational decisions and program execution time. We investigate the tradeoffs between cache size and CPU/Cache cycle time, set associativity and cycle time, and between block size and main memory speed. The results indicate that neither cycle time nor cache size dominates the other across the entire design space. For common implementation technologies, performance is maximized when the size is increased to the 32KB to 128KB range with modest penalties to the cycle time. If set associativity impacts the cycle time by more than a few nanoseconds, it increases overall execution time. Since the block size and memory transfer rate combine to affect the cache miss penalty, the optimum block size is substantially smaller than that which minimizes the miss rate. Finally, the interdependence between optimal cache configuration and the main memory speed necessitates multi-level cache hierarchies for high performance uniprocessors.