Inexpensive implementations of set-associativity
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
A case for two-way skewed-associative caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache designs with partial address matching
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Avoiding conflict misses dynamically in large direct-mapped caches
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
The memory wall and the CMOS end-point
ACM SIGARCH Computer Architecture News
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving cache performance with balanced tag and data paths
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
LRU-based column-associative caches
ACM SIGARCH Computer Architecture News
Capturing dynamic memory reference behavior with adaptive cache topology
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Cold-start vs. warm-start miss ratios
Communications of the ACM
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An adaptive serial-parallel CAM architecture for low-power cache blocks
Proceedings of the 2002 international symposium on Low power electronics and design
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Improved indexing for cache miss reduction in embedded systems
Proceedings of the 40th annual Design Automation Conference
Predictive sequential associative cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
Exploring High Bandwidth Pipelined Cache Architecture for Scaled Technology
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A way-halting cache for low-energy high-performance systems
ACM Transactions on Architecture and Code Optimization (TACO)
Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches
Proceedings of the 33rd annual international symposium on Computer Architecture
On reducing load/store latencies of cache accesses
Journal of Systems Architecture: the EUROMICRO Journal
Runtime adaptation: a case for reactive code alignment
Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Hi-index | 0.00 |
Level-one caches normally reside on a processor's critical path, which determines clock frequency. Therefore, fast access to level-one cache is important. Direct-mapped caches exhibit faster access time, but poor hit rates, compared with same sized set-associative caches because of nonuniform accesses to the cache sets. The nonuniform accesses generate more cache misses in some sets, while other sets are underutilized. We propose to increase the decoder length and, hence, reduce the accesses to heavily used sets without dynamically detecting the cache set usage information. We increase the access to the underutilized cache sets by incorporating a replacement policy into the cache design using programmable decoders. On average, the proposed techniques achieve as low a miss rate as a traditional 4-way cache on all 26 SPEC2K benchmarks for the instruction and data caches, respectively. This translates into an average IPC improvement of 21.5 and 42.4% for SPEC2K integer and floating-point benchmarks, respectively. The B-Cache consumes 10.5% more power per access, but exhibits a 12% total memory access-related energy savings as a result of the miss rate reductions, and, hence, the reduction to applications' execution time. Compared with previous techniques that aim at reducing the miss rate of direct-mapped caches, our technique requires only one cycle to access all cache hits and has the same access time of a direct-mapped cache.