ATUM: a new technique for capturing address traces using microcode
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Cache design of a sub-micron CMOS system/370
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Hierarchical cache/bus architecture for shared memory multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Cache Operations by MRU Change
IEEE Transactions on Computers
Cache performance of operating system and multiprogramming workloads
ACM Transactions on Computer Systems (TOCS)
On the inclusion properties for multi-level cache hierarchies
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A simulation study of two-level caches
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches
Computer
ACM Computing Surveys (CSUR)
Cache Performance in the VAX-11/780
ACM Transactions on Computer Systems (TOCS)
Bibliography and reading on CPU cache memories and related topics
ACM SIGARCH Computer Architecture News
Cache memories for PDP-11 family computers
ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Tradeoffs in supporting two page sizes
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache designs with partial address matching
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
High-bandwidth address translation for multiple-issue processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Energy-driven integrated hardware-software optimizations using SimplePower
Proceedings of the 27th annual international symposium on Computer architecture
L1 data cache decomposition for energy efficiency
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Runtime identification of cache conflict misses: The adaptive miss buffer
ACM Transactions on Computer Systems (TOCS)
Bloom filtering cache misses for accurate data speculation and prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Evaluating Integrated Hardware-Software Optimizations Using a Unified Energy Estimation Framework
IEEE Transactions on Computers
Partitioned first-level cache design for clustered microarchitectures
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Predictive sequential associative cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Caches versus object allocation
IWOOOS '96 Proceedings of the 5th International Workshop on Object Orientation in Operating Systems (IWOOOS '96)
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Hierarchical Binary Set Partitioning in Cache Memories
The Journal of Supercomputing
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
RECAST: Boosting Tag Line Buffer Coverage in Low-Power High-Level Caches "for Free"
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Proceedings of the 20th annual international conference on Supercomputing
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Interconnect design considerations for large NUCA caches
Proceedings of the 34th annual international symposium on Computer architecture
Dynamic tag reduction for low-power caches in embedded systems with virtual memory
International Journal of Parallel Programming
Reducing cache misses through programmable decoders
ACM Transactions on Architecture and Code Optimization (TACO)
Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Partial address directory for cache access
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reducing L1 caches power by exploiting software semantics
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
LP-NUCA: networks-in-cache for high-performance low-power embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
The traditional approach to implementing wide set-associativity is expensive, requiring a wide tag memory (directory) and many comparators. Here we examine alternative implementations of associativity that use hardware similar to that used to implement a direct-mapped cache. One approach scans tags serially from most-recently used to least-recently used. Another uses a partial compare of a few bits from each tag to reduce the number of tags that must be examined serially. The drawback of both approaches is that they increase cache access time by a factor of two or more over the traditional implementation of set-associativity, making them inappropriate for cache designs in which a fast access time is crucial (e.g. level one caches, caches directly servicing processor requests).These schemes are useful, however, if (1) the low miss ratio of wide set-associative caches is desired, (2) the low cost of a direct-mapped implementation is preferred, and (3) the slower access time of these approaches can be tolerated. We expect these conditions to be true for caches in multiprocessors designed to reduce memory interconnection traffic, caches implemented with large, narrow memory chips, and level two (or higher) caches in a cache hierarchy.