Experimental evaluation of on-chip microprocessor cache memories

Authors:
Mark D. Hill;Alan Jay Smith
Affiliations:
Computer Science Division, Department of Electrical Engineering and Computer Science, University of California, Berkeley, California;Computer Science Division, Department of Electrical Engineering and Computer Science, University of California, Berkeley, California
Venue:
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Year:
1984

Citing 7
Cited 38

Cache Memories

ACM Computing Surveys (CSUR)
Computer Structures: Principles and Examples

Computer Structures: Principles and Examples
Architecture of a VLSI instruction cache for a RISC

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Using cache memory to reduce processor-memory traffic

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
RISC I: A Reduced Instruction Set VLSI Computer

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Cache memories for PDP-11 family computers

ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
S-1 architecture manual

S-1 architecture manual

ATUM: a new technique for capturing address traces using microcode

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A class of compatible cache consistency protocols and their support by the IEEE futurebus

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Multiprocessor cache synchronization: issues, innovations, evolution

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Line (block) size choice for CPU cache memories

IEEE Transactions on Computers
An architectural perspective on a memory access controller

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Multiprocessor cache design considerations

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Performance evaluation of on-chip register and cache organizations

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Design Considerations for a General-Purpose Microprocessor

Computer
Efficient (stack) algorithms for analysis of write-back and sector memories

ACM Transactions on Computer Systems (TOCS)
An analytical cache model

ACM Transactions on Computer Systems (TOCS)
The effects of processor architecture on instruction memory traffic

ACM Transactions on Computer Systems (TOCS)
Classification and performance evaluation of instruction buffering techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
On reconfigurable on-chip data caches

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Cache behavior of combinator graph reduction

ACM Transactions on Programming Languages and Systems (TOPLAS)
Processor Architecture and Data Buffering

IEEE Transactions on Computers
Optimal allocation of on-chip memory for multiple-API operating systems

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Surpassing the TLB performance of superpages with less operating system support

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A new page table for 64-bit address spaces

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Instruction fetch mechanisms for VLIW architectures with compressed encodings

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The selection of optimal cache lines for microprocessor-based controllers

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Proceedings of the 24th annual international symposium on Computer architecture
A Performance Study of Instruction Cache Prefetching Methods

IEEE Transactions on Computers
An evaluation of staged run-time optimizations in DyC

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
The pool of subsectors cache design

ICS '99 Proceedings of the 13th international conference on Supercomputing
An empirical evaluation of two memory-efficient directory methods

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache evaluation and the impact of workload choice

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Calpa: a tool for automating selective dynamic compilation

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The benefits and costs of DyC's run-time optimizations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance Trade-Offs for Microprocessor Cache Memories

IEEE Micro
The Cache DRAM Architecture: A DRAM with an On-Chip Cache Memory

IEEE Micro
The Effect of Code Expanding Optimizations on Instruction Cache Design

IEEE Transactions on Computers
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
MORPH: a system architecture for robust high performance using customization (an NSF 100 TeraOps point design study)

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
A retrospective on: "an evaluation of staged run-time optimizations in DyC"

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking

Proceedings of the 32nd annual international symposium on Computer Architecture
Increasing cache capacity through word filtering

Proceedings of the 21st annual international conference on Supercomputing
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Advances in integrated circuit density are permitting the implementation on a single chip of functions and performance enhancements beyond those of a basic processors. One performance enhancement of proven value is a cache memory; placing a cache on the processor chip can reduce both mean memory access time and bus traffic. In this paper we use trace driven simulation to study design tradeoffs for small (on-chip) caches. Miss ratio and traffic ratio (bus traffic) are the metrics for cache performance. Particular attention is paid to sub-block caches (also known as sector caches), in which address tags are associated with blocks, each of which contains multiple sub-blocks; sub-blocks are the transfer unit. Using traces from two 16-bit architectures (Z8000, PDP-11) and two 32-bit architectures (VAX-11, System/370), we find that general purpose caches of 64 bytes (net size) are marginally useful in some cases, while 1024-byte caches perform fairly well; typical miss and traffic ratios for a 1024 byte (net size) cache, 4-way set associative with 8 byte blocks are: PDP-11: .039, .156, Z8000: .015, .060, VAX 11: .080, .160, Sys/370: .244, .489. (These figures are based on traces of user programs and the performance obtained in practice is likely to be less good.) The use of sub-blocks allows tradeoffs between miss ratio and traffic ratio for a given cache size. Load forward is quite useful. Extensive simulation results are presented.