Decoupled sectored caches: conciliating low tag implementation cost

Authors:
A. Seznec
Affiliations:
IRISA, Campus de Beaulieu, 35042 Rennes Cedex, France
Venue:
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Year:
1994

Citing 11
Cited 32

Line (block) size choice for CPU cache memories

IEEE Transactions on Computers
Cache design of a sub-micron CMOS system/370

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
On the inclusion properties for multi-level cache hierarchies

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches

Computer
The effect of sharing on the cache and bus performance of parallel programs

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
MIPS RISC architectures

MIPS RISC architectures
Second bibliography on Cache memories

ACM SIGARCH Computer Architecture News
Bibliography and reading on CPU cache memories and related topics

ACM SIGARCH Computer Architecture News
Analysis of Cache Performance for Operating Systems and Multiprogramming

Analysis of Cache Performance for Operating Systems and Multiprogramming
Using cache memory to reduce processor-memory traffic

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture

CAT—caching address tags: a technique for reducing area cost of on-chip caches

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Don't use the page number, but a pointer to it

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Decoupled Sectored Caches

IEEE Transactions on Computers
Minimizing Area Cost of On-Chip Cache Memories by Caching Address Tags

IEEE Transactions on Computers
Exploiting spatial locality in data caches using spatial footprints

Proceedings of the 25th annual international symposium on Computer architecture
Investigating optimal local memory performance

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories

IEEE Transactions on Computers - Special issue on cache memory and related problems
An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels

IEEE Transactions on Computers - Special issue on cache memory and related problems
The pool of subsectors cache design

ICS '99 Proceedings of the 13th international conference on Supercomputing
Limited Bandwidth to Affect Processor Design

IEEE Micro
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Two-Level Address Storage and Address Prediction (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Two techniques for improving performance on bus-based multiprocessors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Design and Optimization of Large Size and Low Overhead Off-Chip Caches

IEEE Transactions on Computers
Decode filter cache for energy efficient instruction cache hierarchy in super scalar architectures

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking

Proceedings of the 32nd annual international symposium on Computer Architecture
A predictive decode filter cache for reducing power consumption in embedded processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Increasing cache capacity through word filtering

Proceedings of the 21st annual international conference on Supercomputing
Zero-content augmented caches

Proceedings of the 23rd international conference on Supercomputing
Decoupled zero-compressed memory

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput

Proceedings of the 38th annual international symposium on Computer architecture
Moguls: a model to explore the memory hierarchy for bandwidth improvements

Proceedings of the 38th annual international symposium on Computer architecture
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
The dynamic granularity memory system

Proceedings of the 39th Annual International Symposium on Computer Architecture
MAGE: adaptive granularity and ECC for resilient and power efficient memory systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache

Proceedings of the 40th Annual International Symposium on Computer Architecture
Protozoa: adaptive granularity cache coherence

Proceedings of the 40th Annual International Symposium on Computer Architecture
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A locality-aware memory hierarchy for energy-efficient GPU architectures

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
DESC: energy-efficient data exchange using synchronized counters

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The reuse cache: downsizing the shared last-level cache

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Sectored caches have been used for many years in order to reconcile low tag array size and small or medium block size. In a sectored cache, a single address tag is associated with a sector consisting on several cache lines, while validity, dirty and coherency tags are associated with each of the inner cache lines.Maintaining a low tag array size is a major issue in many cache designs (e.g. L2 caches). Using a sectored cache is a design trade-off between a low size of the tag array which is possible with large line size and a low memory traffic which requires a small line size.This technique has been used in many cache designs including small on-chip microprocessor caches and large external second level caches. Unfortunately, as on some applications, the miss ratio on a sectored cache is significantly higher than the miss ratio on a non-sectored cache (factors higher than two are commonly observed), a significant part of the potential performance may be wasted in miss penalties.Usually in a cache, a cache line location is statically linked to one and only one address tag word location. In the decoupled sectored cache we introduce in this paper, this monolithic association is broken; the address tag location associated with a cache line location is dynamically chosen at fetch time among several possible locations.The tag volume on a decoupled sectored cache is in the same range as the tag volume in a traditional sectored cache; but the hit ratio on a decoupled sectored cache is very close to the hit ratio on a non-sectored cache. A decoupled sectored cache will allow the same level of performance as a non-sectored cache, but at a significantly lower hardware cost.