Computer organization & design: the hardware/software interface
Computer organization & design: the hardware/software interface
Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Scalable high speed IP routing lookups
SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
Run-time spatial locality detection and optimization
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Compiler optimizations for eliminating cache conflict misses
Compiler optimizations for eliminating cache conflict misses
The case for a configure-and-execute paradigm
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Global multimedia system design exploration using accurate memory organization feedback
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Towards effective embedded processors in codesigns: customizable partitioned caches
Proceedings of the ninth international symposium on Hardware/software codesign
An Introduction to the Theory of Computation
An Introduction to the Theory of Computation
Improving cache Performance Through Tiling and Data Alignment
IRREGULAR '97 Proceedings of the 4th International Symposium on Solving Irregularly Structured Problems in Parallel
Fast address look-up for internet routers
BC '98 Proceedings of the IFIP TC6/WG6.2 Fourth International Conference on Broadband Communications: The future of telecommunications
Tag Overflow Buffering: An Energy-Efficient Cache Architecture
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Automated data cache placement for embedded VLIW ASIPs
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
An efficient direct mapped instruction cache for application-specific embedded systems
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Reducing cache misses by application-specific re-configurable indexing
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Application-specific reconfigurable XOR-indexing to eliminate cache conflict misses
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches
Proceedings of the 33rd annual international symposium on Computer Architecture
Reducing cache misses through programmable decoders
ACM Transactions on Architecture and Code Optimization (TACO)
Constructing optimal XOR-functions to minimize cache conflict misses
ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
A comparative analysis of performance improvement schemes for cache memories
Computers and Electrical Engineering
ASCIB: adaptive selection of cache indexing bits for removing conflict misses
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Hi-index | 0.00 |
The increasing use of microprocessor cores in embedded systems as well as mobile and portable devices creates an opportunity for customizing the cache subsystem for improved performance. In traditional cache design, the index portion of the memory address bus consists of the K least significant bits, where K=log2(D) and D is the depth of the cache. However, in devices where the application set is known and characterized (e.g., systems that execute a fixed application set) there is an opportunity to improve cache performance by choosing an optimal set of bits used as index into the cache. This technique does not add any overhead in terms of area or delay. We give an efficient heuristic algorithm for selecting K index bits for improved cache performance. We show the feasibility of our algorithm by applying it to a large number of embedded system applications as well as the integer SPEC CPU 2000 benchmarks.