Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches

Authors:
Chuanjun Zhang
Affiliations:
University of Missouri-Kansas City
Venue:
Proceedings of the 33rd annual international symposium on Computer Architecture
Year:
2006

Citing 23
Cited 17

A case for two-way skewed-associative caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Cache designs with partial address matching

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The difference-bit cache

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving cache performance with balanced tag and data paths

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
LRU-based column-associative caches

ACM SIGARCH Computer Architecture News
Capturing dynamic memory reference behavior with adaptive cache topology

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An adaptive serial-parallel CAM architecture for low-power cache blocks

Proceedings of the 2002 international symposium on Low power electronics and design
Two Fast and High-Associativity Cache Schemes

IEEE Micro
Improved indexing for cache miss reduction in embedded systems

Proceedings of the 40th annual Design Automation Conference
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A highly configurable cache architecture for embedded systems

Proceedings of the 30th annual international symposium on Computer architecture
Exploring High Bandwidth Pipelined Cache Architecture for Scaled Technology

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A way-halting cache for low-energy high-performance systems

ACM Transactions on Architecture and Code Optimization (TACO)
Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
An efficient direct mapped instruction cache for application-specific embedded systems

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Balanced Instruction Cache: Reducing Conflict Misses of Direct-Mapped Caches through Balanced Subarray Accesses

IEEE Computer Architecture Letters

Compiler-managed partitioned data caches for low power

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Reducing cache misses through programmable decoders

ACM Transactions on Architecture and Code Optimization (TACO)
YAARC: yet another approach to further reducing the rate of conflict misses

The Journal of Supercomputing
Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Direct address translation for virtual memory in energy-efficient embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
A novel cache architecture with enhanced performance and security

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Adaptive line placement with the set balancing cache

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Customized placement for high performance embedded processor caches

ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Thread owned block cache: managing latency in many-core architecture

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Cache equalizer: a placement mechanism for chip multiprocessor distributed shared caches

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Full Length Article: An on-chip instruction cache design with one-bit tag for low-power embedded systems

Microprocessors & Microsystems
An energy-efficient adaptive hybrid cache

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
DEFCAM: A design and evaluation framework for defect-tolerant cache memories

ACM Transactions on Architecture and Code Optimization (TACO)
Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines

Proceedings of the 2nd ACM Symposium on Cloud Computing
A comparative analysis of performance improvement schemes for cache memories

Computers and Electrical Engineering
ASCIB: adaptive selection of cache indexing bits for removing conflict misses

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Virtually split cache: An efficient mechanism to distribute instructions and data

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Level one cache normally resides on a processor's critical path, which determines the clock frequency. Directmapped caches exhibit fast access time but poor hit rates compared with same sized set-associative caches due to nonuniform accesses to the cache sets, which generate more conflict misses in some sets while other sets are underutilized. We propose a technique to reduce the miss rate of direct mapped caches through balancing the accesses to cache sets. We increase the decoder length and thus reduce the accesses to heavily used sets without dynamically detecting the cache set usage information. We introduce a replacement policy to direct-mapped cache design and increase the access to the underutilized cache sets with the help of programmable decoders. On average, the proposed balanced cache, or BCache, achieves 64.5% and 37.8% miss rate reductions on all 26 SPEC2K benchmarks for the instruction and data caches, respectively. This translates into an average IPC improvement of 5.9%. The B-Cache consumes 10.5% more power per access but exhibits 2% total memory access related energy saving due to the miss rate reductions and hence the reduction to applications' execution time. Compared with previous techniques that aim at reducing the miss rate of direct-mapped caches, our technique requires only one cycle to access all cache hits and has the same access time of a direct-mapped cache.