Improved indexing for cache miss reduction in embedded systems

Authors:
Tony Givargis
Affiliations:
University of California, Irvine, CA
Venue:
Proceedings of the 40th annual Design Automation Conference
Year:
2003

Citing 15
Cited 10

Computer organization & design: the hardware/software interface

Computer organization & design: the hardware/software interface
Cache design trade-offs for power and performance optimization: a case study

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Scalable high speed IP routing lookups

SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
Run-time spatial locality detection and optimization

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Compiler optimizations for eliminating cache conflict misses

Compiler optimizations for eliminating cache conflict misses
The case for a configure-and-execute paradigm

CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Global multimedia system design exploration using accurate memory organization feedback

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A low power unified cache architecture providing power and performance flexibility (poster session)

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Towards effective embedded processors in codesigns: customizable partitioned caches

Proceedings of the ninth international symposium on Hardware/software codesign
An Introduction to the Theory of Computation

An Introduction to the Theory of Computation
A New Direction for Computer Architecture Research

Computer
V830R/AV: Embedded Multimedia Superscalar RISC Processor

IEEE Micro
Improving cache Performance Through Tiling and Data Alignment

IRREGULAR '97 Proceedings of the 4th International Symposium on Solving Irregularly Structured Problems in Parallel
Fast address look-up for internet routers

BC '98 Proceedings of the IFIP TC6/WG6.2 Fourth International Conference on Broadband Communications: The future of telecommunications

Tag Overflow Buffering: An Energy-Efficient Cache Architecture

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Automated data cache placement for embedded VLIW ASIPs

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
An efficient direct mapped instruction cache for application-specific embedded systems

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Reducing cache misses by application-specific re-configurable indexing

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Application-specific reconfigurable XOR-indexing to eliminate cache conflict misses

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches

Proceedings of the 33rd annual international symposium on Computer Architecture
Reducing cache misses through programmable decoders

ACM Transactions on Architecture and Code Optimization (TACO)
Constructing optimal XOR-functions to minimize cache conflict misses

ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
A comparative analysis of performance improvement schemes for cache memories

Computers and Electrical Engineering
ASCIB: adaptive selection of cache indexing bits for removing conflict misses

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing use of microprocessor cores in embedded systems as well as mobile and portable devices creates an opportunity for customizing the cache subsystem for improved performance. In traditional cache design, the index portion of the memory address bus consists of the K least significant bits, where K=log2(D) and D is the depth of the cache. However, in devices where the application set is known and characterized (e.g., systems that execute a fixed application set) there is an opportunity to improve cache performance by choosing an optimal set of bits used as index into the cache. This technique does not add any overhead in terms of area or delay. We give an efficient heuristic algorithm for selecting K index bits for improved cache performance. We show the feasibility of our algorithm by applying it to a large number of embedded system applications as well as the integer SPEC CPU 2000 benchmarks.