Reducing cache misses through programmable decoders

Authors:
Chuanjun Zhang
Affiliations:
University of Missouri-Kansas City, Kansas City, Missouri
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2008

Citing 27
Cited 2

Inexpensive implementations of set-associativity

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
A case for two-way skewed-associative caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache designs with partial address matching

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
The memory wall and the CMOS end-point

ACM SIGARCH Computer Architecture News
The difference-bit cache

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving cache performance with balanced tag and data paths

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
LRU-based column-associative caches

ACM SIGARCH Computer Architecture News
Capturing dynamic memory reference behavior with adaptive cache topology

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Cold-start vs. warm-start miss ratios

Communications of the ACM
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An adaptive serial-parallel CAM architecture for low-power cache blocks

Proceedings of the 2002 international symposium on Low power electronics and design
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Two Fast and High-Associativity Cache Schemes

IEEE Micro
Improved indexing for cache miss reduction in embedded systems

Proceedings of the 40th annual Design Automation Conference
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A highly configurable cache architecture for embedded systems

Proceedings of the 30th annual international symposium on Computer architecture
Exploring High Bandwidth Pipelined Cache Architecture for Scaled Technology

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A way-halting cache for low-energy high-performance systems

ACM Transactions on Architecture and Code Optimization (TACO)
Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches

Proceedings of the 33rd annual international symposium on Computer Architecture

On reducing load/store latencies of cache accesses

Journal of Systems Architecture: the EUROMICRO Journal
Runtime adaptation: a case for reactive code alignment

Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era

Quantified Score

Hi-index	0.00

Visualization

Abstract

Level-one caches normally reside on a processor's critical path, which determines clock frequency. Therefore, fast access to level-one cache is important. Direct-mapped caches exhibit faster access time, but poor hit rates, compared with same sized set-associative caches because of nonuniform accesses to the cache sets. The nonuniform accesses generate more cache misses in some sets, while other sets are underutilized. We propose to increase the decoder length and, hence, reduce the accesses to heavily used sets without dynamically detecting the cache set usage information. We increase the access to the underutilized cache sets by incorporating a replacement policy into the cache design using programmable decoders. On average, the proposed techniques achieve as low a miss rate as a traditional 4-way cache on all 26 SPEC2K benchmarks for the instruction and data caches, respectively. This translates into an average IPC improvement of 21.5 and 42.4% for SPEC2K integer and floating-point benchmarks, respectively. The B-Cache consumes 10.5% more power per access, but exhibits a 12% total memory access-related energy savings as a result of the miss rate reductions, and, hence, the reduction to applications' execution time. Compared with previous techniques that aim at reducing the miss rate of direct-mapped caches, our technique requires only one cycle to access all cache hits and has the same access time of a direct-mapped cache.