Cache design of a sub-micron CMOS system/370
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Cache Operations by MRU Change
IEEE Transactions on Computers
Cache performance of operating system and multiprogramming workloads
ACM Transactions on Computer Systems (TOCS)
Multiprocessor cache analysis using ATUM
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Inexpensive implementations of set-associativity
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
A case for two-way skewed-associative caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Predictive sequential associative cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Capturing dynamic memory reference behavior with adaptive cache topology
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-optimal methods for bit-reversals
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Partitioned first-level cache design for clustered microarchitectures
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Hierarchical Binary Set Partitioning in Cache Memories
The Journal of Supercomputing
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 20th annual international conference on Supercomputing
Reducing cache misses through programmable decoders
ACM Transactions on Architecture and Code Optimization (TACO)
Design of a remote controlled caching proxy system: architecture, algorithm and implementation
TELE-INFO'05 Proceedings of the 4th WSEAS International Conference on Telecommunications and Informatics
Reconfigurable energy efficient near threshold cache architectures
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Recruiting Decay for Dynamic Power Reduction in Set-Associative Caches
Transactions on High-Performance Embedded Architectures and Compilers II
Adaptive line placement with the set balancing cache
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Applying decay to reduce dynamic power in set-associative caches
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
The ZCache: Decoupling Ways and Associativity
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Soft error mitigation in cache memories of embedded systems by means of a protected scheme
LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
ASCIB: adaptive selection of cache indexing bits for removing conflict misses
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
TLC: a tag-less cache for reducing dynamic first level cache energy
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Virtually split cache: An efficient mechanism to distribute instructions and data
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
A traditional implementation of the set-associative cache has the disadvantage of longer access cycle times than that of a direct-mapped cache. Several methods have been proposed for implementing associativity in non-traditional ways. However, most of them have only achieved an associativity of two. The others with higher associativity still present longer access cycle times or suffer from larger average access times. In this paper, we first systematically address implementation issues of associativity and evaluate existing implementation methods. We then propose two schemes for implementing higher associativity: the Sequential Multi-Column Cache, which is an extension of the Column Associative Cache, and the Parallel Multi-Column Cache. In order to achieve the same access cycle time as that of a direct-mapped cache, data memory in the cache is organized into one bank in both schemes. We use the multiple MRU block technique to increase the first hit ratio, thus reducing the average access time. While the Parallel Multi-Column Cache performs the tag checking in parallel, the Sequential Multi-Column Cache sequentially searches through places in a set, and uses index information to filter out unnecessary probes. In the case of an associativity of 4, they both achieve the low miss rate of a 4-way set-associative cache. Our simulation results using ATUM traces show that both schemes can effectively reduce the average access time. They have average improvements of 9.8% and 10.8% in average access time over a direct-mapped cache, respectively, for a cache size of 4K bytes and a miss penalty of 20 cycles, in which case the average improvement of the Column Associative Cache is only 4.3%. The improvement of the Sequential Multi-Column Cache in average access time reaches 22.4% when the associativity is 8 and the miss penalty increases to 100 cycles. The two schemes are effective for both small and large caches (1K bytes to 128K bytes).