Cache write policies and performance

Authors:
Norman P. Jouppi
Affiliations:
-
Venue:
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Year:
1993

Citing 15
Cited 47

Analysis of cache performance for operating systems and multiprogramming

Analysis of cache performance for operating systems and multiprogramming
Measuring VAX 8800 performance with a histogram hardware monitor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Architectural and organizational tradeoffs in the design of the MultiTitan CPU

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Cache and memory hierarchy design: a performance-directed approach

Cache and memory hierarchy design: a performance-directed approach
High-bandwidth data memory systems for superscalar processors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Second bibliography on Cache memories

ACM SIGARCH Computer Architecture News
A high speed superscalar PA-RISC processor

COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Characterizing the Storage Process and Its Effect on the Update of Main Memory by Write Through

Journal of the ACM (JACM)
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
Cache Performance in the VAX-11/780

ACM Transactions on Computer Systems (TOCS)
Bibliography and reading on CPU cache memories and related topics

ACM SIGARCH Computer Architecture News
Using cache memory to reduce processor-memory traffic

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
The 801 minicomputer

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Aspects of cache memory and instruction buffer performance

Aspects of cache memory and instruction buffer performance

Space-efficient closure representations

LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
Memory subsystem performance of programs using copying garbage collection

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Cache performance of garbage-collected programs

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Complexity/performance tradeoffs with non-blocking loads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Memory system performance of programs with intensive heap allocation

ACM Transactions on Computer Systems (TOCS)
Boosting the performance of hybrid snooping cache protocols

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Cache performance of fast-allocating programs

FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
Lazy release consistency for hardware-coherent multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Streamlining data cache access with fast address calculation

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CAT—caching address tags: a technique for reducing area cost of on-chip caches

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hardware implementation issues of data prefetching

ICS '95 Proceedings of the 9th international conference on Supercomputing
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Increasing cache port efficiency for dynamic superscalar microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Designing high bandwidth on-chip caches

Proceedings of the 24th annual international symposium on Computer architecture
Application-driven synthesis of core-based systems

ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
Memory data organization for improved cache performance in embedded processor applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Minimizing Area Cost of On-Chip Cache Memories by Caching Address Tags

IEEE Transactions on Computers
A Performance Study on Bounteous Transfer in Multiprocessor Sectored Caches

The Journal of Supercomputing - Special issue: high performance computing systems
Accelerating multi-media processing by implementing memoing in multiplication and division units

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Efficient and safe-for-space closure conversion

ACM Transactions on Programming Languages and Systems (TOPLAS)
Silent stores for free

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Avoiding initialization misses to the heap

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing power with dynamic critical path information

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Embedded cache architecture with programmable write buffer support for power and performance flexibility

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
UltraSPARC-III: Designing Third-Generation 64-Bit Performance

IEEE Micro
Selective Victim Caching: A Method to Improve the Performance of Direct-Mapped Caches

IEEE Transactions on Computers
Categorizing Network Traffic in Update-Based Protocols on Scalable Multiprocessors

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Do Object-Oriented Languages Need Special Hardware Support?

ECOOP '95 Proceedings of the 9th European Conference on Object-Oriented Programming
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Software assistance for data caches

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Two techniques for improving performance on bus-based multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Using memory-mapped network interfaces to improve the performance of distributed shared memory

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Memory Organization for Improved Data Cache Performance in Embedded Processors

ISSS '96 Proceedings of the 9th international symposium on System synthesis
Cluster miss prediction for instruction caches in embedded networking applications

Proceedings of the 14th ACM Great Lakes symposium on VLSI
Replication Cache: A Small Fully Associative Cache to Improve Data Cache Reliability

IEEE Transactions on Computers
Store Memory-Level Parallelism Optimizations for Commercial Applications

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Simple penalty-sensitive replacement policies for caches

Proceedings of the 3rd conference on Computing frontiers
Reducing the Write Traffic for a Hybrid Cache Protocol

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Improving SDRAM access energy efficiency for low-power embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
ESKIMO: Energy savings using Semantic Knowledge of Inconsequential Memory Occupancy for DRAM subsystem

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Write buffer-oriented energy reduction in the L1 data cache of two-level caches for the embedded system

Proceedings of the 20th symposium on Great lakes symposium on VLSI
Limiting the number of dirty cache lines

Proceedings of the Conference on Design, Automation and Test in Europe
Why nothing matters: the impact of zeroing

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
An update-aware storage system for low-locality update-intensive workloads

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
A memory bandwidth effective cache store miss policy

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
ARI: Adaptive LLC-memory traffic management

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper investigates issues involving writes and caches. First, tradeoffs on writes that miss in the cache are investigated. In particular, whether the missed cache block is fetched on a write miss, whether the missed cache block is allocated in the cache, and whether the cache line is written before hit or miss is known are considered. Depending on the combination of these polices chosen, the entire cache miss rate can vary by a factor of two on some applications. The combination of no-fetch-on-write and write-allocate can provide better performance than cache line allocation instructions. Second, tradeoffs between write-through and write-back caching when writes hit in a cache are considered. A mixture of these two alternatives, called write caching is proposed. Write caching places a small fully-associative cache behind a write-through cache. A write cache can eliminate almost as much write traffic as a write-back cache.