Inexpensive implementations of set-associativity

Authors:
R. E. Kessler;R. Jooss;A. Lebeck;M. D. Hill
Affiliations:
University of Wisconsin, Computer Sciences Department, Madison, Wisconsin;University of Wisconsin, Computer Sciences Department, Madison, Wisconsin;University of Wisconsin, Computer Sciences Department, Madison, Wisconsin;University of Wisconsin, Computer Sciences Department, Madison, Wisconsin
Venue:
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Year:
1989

Citing 15
Cited 36

ATUM: a new technique for capturing address traces using microcode

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Cache design of a sub-micron CMOS system/370

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Hierarchical cache/bus architecture for shared memory multiprocessors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Cache Operations by MRU Change

IEEE Transactions on Computers
Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
On the inclusion properties for multi-level cache hierarchies

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A simulation study of two-level caches

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Performance tradeoffs in cache design

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Case for Direct-Mapped Caches

Computer
Cache Memories

ACM Computing Surveys (CSUR)
Cache Performance in the VAX-11/780

ACM Transactions on Computer Systems (TOCS)
Bibliography and reading on CPU cache memories and related topics

ACM SIGARCH Computer Architecture News
Cache memories for PDP-11 family computers

ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
Aspects of cache memory and instruction buffer performance

Aspects of cache memory and instruction buffer performance

Tradeoffs in supporting two page sizes

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache designs with partial address matching

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
High-bandwidth address translation for multiple-issue processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Functional Implementation Techniques for CPU Cache Memories

IEEE Transactions on Computers - Special issue on cache memory and related problems
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
Way-predicting set-associative cache for high performance and low energy consumption

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Energy-driven integrated hardware-software optimizations using SimplePower

Proceedings of the 27th annual international symposium on Computer architecture
L1 data cache decomposition for energy efficiency

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Runtime identification of cache conflict misses: The adaptive miss buffer

ACM Transactions on Computer Systems (TOCS)
Bloom filtering cache misses for accurate data speculation and prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Two Fast and High-Associativity Cache Schemes

IEEE Micro
Evaluating Integrated Hardware-Software Optimizations Using a Unified Energy Estimation Framework

IEEE Transactions on Computers
Partitioned first-level cache design for clustered microarchitectures

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
DASC cache

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Caches versus object allocation

IWOOOS '96 Proceedings of the 5th International Workshop on Object Orientation in Operating Systems (IWOOOS '96)
TLC: Transmission Line Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Hierarchical Binary Set Partitioning in Cache Memories

The Journal of Supercomputing
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
RECAST: Boosting Tag Line Buffer Coverage in Low-Power High-Level Caches "for Free"

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Heterogeneous way-size cache

Proceedings of the 20th annual international conference on Supercomputing
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Interconnect design considerations for large NUCA caches

Proceedings of the 34th annual international symposium on Computer architecture
Dynamic tag reduction for low-power caches in embedded systems with virtual memory

International Journal of Parallel Programming
Reducing cache misses through programmable decoders

ACM Transactions on Architecture and Code Optimization (TACO)
Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
FPGA implementation of a configurable cache/scratchpad memory with virtualized user-level RDMA capability

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Partial address directory for cache access

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reducing L1 caches power by exploiting software semantics

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
LP-NUCA: networks-in-cache for high-performance low-power embedded processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The traditional approach to implementing wide set-associativity is expensive, requiring a wide tag memory (directory) and many comparators. Here we examine alternative implementations of associativity that use hardware similar to that used to implement a direct-mapped cache. One approach scans tags serially from most-recently used to least-recently used. Another uses a partial compare of a few bits from each tag to reduce the number of tags that must be examined serially. The drawback of both approaches is that they increase cache access time by a factor of two or more over the traditional implementation of set-associativity, making them inappropriate for cache designs in which a fast access time is crucial (e.g. level one caches, caches directly servicing processor requests).These schemes are useful, however, if (1) the low miss ratio of wide set-associative caches is desired, (2) the low cost of a direct-mapped implementation is preferred, and (3) the slower access time of these approaches can be tolerated. We expect these conditions to be true for caches in multiprocessors designed to reduce memory interconnection traffic, caches implemented with large, narrow memory chips, and level two (or higher) caches in a cache hierarchy.