Sequentiality and prefetching in database systems
ACM Transactions on Database Systems (TODS)
Cache evaluation and the impact of workload choice
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Cache Performance in the VAX-11/780
ACM Transactions on Computer Systems (TOCS)
Interference in multiprocessor computer systems with interleaved memory
Communications of the ACM
Performance '84 Proceedings of the Tenth International Symposium on Computer Performance Modelling, Measurement and Evaluation
Cache memories for PDP-11 family computers
ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
Experimental evaluation of on-chip microprocessor cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
An instruction timing model of CPU performance
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Memory Hierarchy Aspects of a Multiprocessor RISC: Cache and Bus Analyses
Memory Hierarchy Aspects of a Multiprocessor RISC: Cache and Bus Analyses
The Memory Architecture and the Cache and Memory Management Unit for
The Memory Architecture and the Cache and Memory Management Unit for
Computer
IEEE Micro
Considerations in block-oriented systems design
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
AFIPS '70 (Fall) Proceedings of the November 17-19, 1970, fall joint computer conference
An inside look at the Z80,000 CPU: Zilog's new 32-bit microprocessor
AFIPS '84 Proceedings of the July 9-12, 1984, national computer conference and exposition
An architectural perspective on a memory access controller
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
An analysis of Memnet—an experiment in high-speed shared-memory local networking
SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
A simulation study of two-level caches
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Analysis of bus hierarchies for multiprocessors
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Cache performance of vector processors
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches
Computer
The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating the performance of four snooping cache coherency protocols
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Multi-level shared caching techniques for scalability in VMP-M/C
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Cache performance of the integer SPEC benchmarks on a RISC
ACM SIGARCH Computer Architecture News
GT-EP: a novel high-performance real-time architecture
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Implementing a cache for a high-performance GaAs microprocessor
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Classification and performance evaluation of instruction buffering techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
On reconfigurable on-chip data caches
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
A conflict-free memory design for multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Subprogram Inlining: A Study of its Effects on Program Execution Time
IEEE Transactions on Software Engineering
Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment
IEEE Transactions on Computers
Performance evaluation of a decoded instruction cache for variable instruction-length computers
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Memory latency effects in decoupled architectures with a single data memory module
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A study of I/O system organizations
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches
IEEE Transactions on Computers
Optimal Partitioning of Cache Memory
IEEE Transactions on Computers
Sparse matrix computations: implications for cache designs
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Improving the cache locality of memory allocation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Evaluating performance of prefetching second level caches
ACM SIGMETRICS Performance Evaluation Review
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
A case for two-way skewed-associative caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ICS '93 Proceedings of the 7th international conference on Supercomputing
The effectiveness of caches for vector processors
ICS '94 Proceedings of the 8th international conference on Supercomputing
A unified architectural tradeoff methodology
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Decoupled sectored caches: conciliating low tag implementation cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Active memory: a new abstraction for memory-system simulation
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Architecture Technique Trade-Offs Using Mean Memory Delay Time
IEEE Transactions on Computers
Active memory: a new abstraction for memory system simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
An efficient caching support for critical sections in large-scale shared-memory multiprocessors
ICS '90 Proceedings of the 4th international conference on Supercomputing
The selection of optimal cache lines for microprocessor-based controllers
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Run-time spatial locality detection and optimization
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Exploiting spatial locality in data caches using spatial footprints
Proceedings of the 25th annual international symposium on Computer architecture
CPU Cache Prefetching: Timing Evaluation of Hardware Implementations
IEEE Transactions on Computers
The pool of subsectors cache design
ICS '99 Proceedings of the 13th international conference on Supercomputing
A program-driven simulation model of an MIMD multiprocessor
ANSS '91 Proceedings of the 24th annual symposium on Simulation
The performance impact of block sizes and fetch strategies
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Trace-driven simulations for a two-level cache design in open bus systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
Cache performance for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Designing a Modern Memory Hierarchy with Hardware Prefetching
IEEE Transactions on Computers
Correction to "Line (Block) Size Choice for CPU Cache Memories"
IEEE Transactions on Computers
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
A Quantitative Evaluation of Cache Types for High-Performance Computer Systems
IEEE Transactions on Computers
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
Memory Latency Effects in Decoupled Architectures
IEEE Transactions on Computers
Performance Evaluation of a Decoded Instruction Cache for Variable Instruction Length Computers
IEEE Transactions on Computers
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
IEEE Transactions on Computers
Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps
IEEE Transactions on Parallel and Distributed Systems
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
A Cache Coherency Protocol for Optically Connected Parallel Computer Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Evaluation of cache consistency algorithm performance
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
The impact of extrinsic cache performance on predictability of real-time systems
RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
Encyclopedia of Computer Science
Increasing cache capacity through word filtering
Proceedings of the 21st annual international conference on Supercomputing
Variable-sized object packing and its applications to instruction cache design
Computers and Electrical Engineering
Computer performance analysis and the Pi Theorem
Computer Science - Research and Development
Hi-index | 15.03 |
The line (block) size of a cache memory is one of the parameters that most strongly affects cache performance. In this paper, we study the factors that relate to the selection of a cache line size. Our primary focus is on the cache miss ratio, but we also consider influences such as logic complexity, address tags, line crossers, I/O overruns, etc. The behavior of the cache miss ratio as a function of line size is examined carefully through the use of trace driven simulation, using 27 traces from five different machine architectures. The change in cache miss ratio as the line size varies is found to be relatively stable across workloads, and tables of this function are presented for instruction caches, data caches, and unified caches. An empirical mathematical fit is obtained. This function is used to extend previously published design target miss ratios to cover line sizes from 4 to 128 bytes and cache sizes from 32 bytes to 32K bytes; design target miss ratios are to be used to guide new machine designs. Mean delays per memory reference and memory (bus) traffic rates are computed as a function of line and cache size, and memory access time parameters. We find that for high performance microprocessor designs, line sizes in the range 16-64 bytes seem best; shorter line sizes yield high delays due to memory latency, although they reduce memory traffic somewhat. Longer line sizes are suitable for mainframes because of the higher bandwidth to main memory.