Line (block) size choice for CPU cache memories
IEEE Transactions on Computers
Cache design of a sub-micron CMOS system/370
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
On the inclusion properties for multi-level cache hierarchies
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches
Computer
The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
MIPS RISC architectures
Second bibliography on Cache memories
ACM SIGARCH Computer Architecture News
Bibliography and reading on CPU cache memories and related topics
ACM SIGARCH Computer Architecture News
Analysis of Cache Performance for Operating Systems and Multiprogramming
Analysis of Cache Performance for Operating Systems and Multiprogramming
Using cache memory to reduce processor-memory traffic
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
CAT—caching address tags: a technique for reducing area cost of on-chip caches
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Don't use the page number, but a pointer to it
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
IEEE Transactions on Computers
Minimizing Area Cost of On-Chip Cache Memories by Caching Address Tags
IEEE Transactions on Computers
Exploiting spatial locality in data caches using spatial footprints
Proceedings of the 25th annual international symposium on Computer architecture
Investigating optimal local memory performance
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels
IEEE Transactions on Computers - Special issue on cache memory and related problems
The pool of subsectors cache design
ICS '99 Proceedings of the 13th international conference on Supercomputing
Limited Bandwidth to Affect Processor Design
IEEE Micro
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Two-Level Address Storage and Address Prediction (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Two techniques for improving performance on bus-based multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Design and Optimization of Large Size and Low Overhead Off-Chip Caches
IEEE Transactions on Computers
Decode filter cache for energy efficient instruction cache hierarchy in super scalar architectures
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking
Proceedings of the 32nd annual international symposium on Computer Architecture
A predictive decode filter cache for reducing power consumption in embedded processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Increasing cache capacity through word filtering
Proceedings of the 21st annual international conference on Supercomputing
Proceedings of the 23rd international conference on Supercomputing
Decoupled zero-compressed memory
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput
Proceedings of the 38th annual international symposium on Computer architecture
Moguls: a model to explore the memory hierarchy for bandwidth improvements
Proceedings of the 38th annual international symposium on Computer architecture
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
The dynamic granularity memory system
Proceedings of the 39th Annual International Symposium on Computer Architecture
MAGE: adaptive granularity and ECC for resilient and power efficient memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 40th Annual International Symposium on Computer Architecture
Protozoa: adaptive granularity cache coherence
Proceedings of the 40th Annual International Symposium on Computer Architecture
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A locality-aware memory hierarchy for energy-efficient GPU architectures
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
DESC: energy-efficient data exchange using synchronized counters
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The reuse cache: downsizing the shared last-level cache
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.01 |
Sectored caches have been used for many years in order to reconcile low tag array size and small or medium block size. In a sectored cache, a single address tag is associated with a sector consisting on several cache lines, while validity, dirty and coherency tags are associated with each of the inner cache lines.Maintaining a low tag array size is a major issue in many cache designs (e.g. L2 caches). Using a sectored cache is a design trade-off between a low size of the tag array which is possible with large line size and a low memory traffic which requires a small line size.This technique has been used in many cache designs including small on-chip microprocessor caches and large external second level caches. Unfortunately, as on some applications, the miss ratio on a sectored cache is significantly higher than the miss ratio on a non-sectored cache (factors higher than two are commonly observed), a significant part of the potential performance may be wasted in miss penalties.Usually in a cache, a cache line location is statically linked to one and only one address tag word location. In the decoupled sectored cache we introduce in this paper, this monolithic association is broken; the address tag location associated with a cache line location is dynamically chosen at fetch time among several possible locations.The tag volume on a decoupled sectored cache is in the same range as the tag volume in a traditional sectored cache; but the hit ratio on a decoupled sectored cache is very close to the hit ratio on a non-sectored cache. A decoupled sectored cache will allow the same level of performance as a non-sectored cache, but at a significantly lower hardware cost.