Two Fast and High-Associativity Cache Schemes

Authors:
Chenxi Zhang;Xiaodong Zhang;Yong Yan
Affiliations:
-;-;-
Venue:
IEEE Micro
Year:
1997

Citing 10
Cited 18

Cache design of a sub-micron CMOS system/370

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Cache Operations by MRU Change

IEEE Transactions on Computers
Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
Multiprocessor cache analysis using ATUM

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Inexpensive implementations of set-associativity

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
A case for two-way skewed-associative caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture

Capturing dynamic memory reference behavior with adaptive cache topology

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-optimal methods for bit-reversals

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Access-Mode Predictions for Low-Power Cache Design

IEEE Micro
Partitioned first-level cache design for clustered microarchitectures

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Hierarchical Binary Set Partitioning in Cache Memories

The Journal of Supercomputing
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches

Proceedings of the 33rd annual international symposium on Computer Architecture
Heterogeneous way-size cache

Proceedings of the 20th annual international conference on Supercomputing
Reducing cache misses through programmable decoders

ACM Transactions on Architecture and Code Optimization (TACO)
Design of a remote controlled caching proxy system: architecture, algorithm and implementation

TELE-INFO'05 Proceedings of the 4th WSEAS International Conference on Telecommunications and Informatics
Reconfigurable energy efficient near threshold cache architectures

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Recruiting Decay for Dynamic Power Reduction in Set-Associative Caches

Transactions on High-Performance Embedded Architectures and Compilers II
Adaptive line placement with the set balancing cache

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Applying decay to reduce dynamic power in set-associative caches

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
The ZCache: Decoupling Ways and Associativity

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Soft error mitigation in cache memories of embedded systems by means of a protected scheme

LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
ASCIB: adaptive selection of cache indexing bits for removing conflict misses

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
TLC: a tag-less cache for reducing dynamic first level cache energy

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Virtually split cache: An efficient mechanism to distribute instructions and data

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A traditional implementation of the set-associative cache has the disadvantage of longer access cycle times than that of a direct-mapped cache. Several methods have been proposed for implementing associativity in non-traditional ways. However, most of them have only achieved an associativity of two. The others with higher associativity still present longer access cycle times or suffer from larger average access times. In this paper, we first systematically address implementation issues of associativity and evaluate existing implementation methods. We then propose two schemes for implementing higher associativity: the Sequential Multi-Column Cache, which is an extension of the Column Associative Cache, and the Parallel Multi-Column Cache. In order to achieve the same access cycle time as that of a direct-mapped cache, data memory in the cache is organized into one bank in both schemes. We use the multiple MRU block technique to increase the first hit ratio, thus reducing the average access time. While the Parallel Multi-Column Cache performs the tag checking in parallel, the Sequential Multi-Column Cache sequentially searches through places in a set, and uses index information to filter out unnecessary probes. In the case of an associativity of 4, they both achieve the low miss rate of a 4-way set-associative cache. Our simulation results using ATUM traces show that both schemes can effectively reduce the average access time. They have average improvements of 9.8% and 10.8% in average access time over a direct-mapped cache, respectively, for a cache size of 4K bytes and a miss penalty of 20 cycles, in which case the average improvement of the Column Associative Cache is only 4.3%. The improvement of the Sequential Multi-Column Cache in average access time reaches 22.4% when the associativity is 8 and the miss penalty increases to 100 cycles. The two schemes are effective for both small and large caches (1K bytes to 128K bytes).