Zero cost indexing for improved processor cache performance

Authors:
Tony Givargis
Affiliations:
University of California, Irvine, Irvine, CA
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2006

Citing 16
Cited 3

Computer organization & design: the hardware/software interface

Computer organization & design: the hardware/software interface
Cache design trade-offs for power and performance optimization: a case study

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Compiler optimizations for eliminating cache conflict misses

Compiler optimizations for eliminating cache conflict misses
The case for a configure-and-execute paradigm

CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Memory aware compilation through accurate timing extraction

Proceedings of the 37th Annual Design Automation Conference
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
A low power unified cache architecture providing power and performance flexibility (poster session)

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Towards effective embedded processors in codesigns: customizable partitioned caches

Proceedings of the ninth international symposium on Hardware/software codesign
An Introduction to the Theory of Computation

An Introduction to the Theory of Computation
Introduction to Algorithms

Introduction to Algorithms
A New Direction for Computer Architecture Research

Computer
V830R/AV: Embedded Multimedia Superscalar RISC Processor

IEEE Micro
Improving cache Performance Through Tiling and Data Alignment

IRREGULAR '97 Proceedings of the 4th International Symposium on Solving Irregularly Structured Problems in Parallel
Near-Optimal Loop Tiling by Means of Cache Miss Equations and Genetic Algorithms

ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops

Dynamic tag reduction for low-power caches in embedded systems with virtual memory

International Journal of Parallel Programming
Direct address translation for virtual memory in energy-efficient embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
An FPGA-based multi-core approach for pipelining computing stages

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing use of microprocessor cores in embedded systems as well as mobile and portable devices creates an opportunity for customizing the cache subsystem for improved performance. In traditional cache design, the index portion of the memory address bus consists of the K least significant bits, where K = log2 D and D is the depth of the cache. However, in devices where the application set is known and characterized (e.g., systems that execute a fixed application set) there is an opportunity to improve cache performance by choosing a near-optimal set of bits used as index into the cache. This technique does not add any overhead in terms of area or delay. In this article, we present an efficient heuristic algorithm for selecting K index bits for improved cache performance. We show the feasibility of our algorithm by applying it to a large number of embedded system applications as well as the integer SPEC CPU 2000 benchmarks. Specifically, for data traces, we show up to 45% reduction in cache misses. Likewise, for instruction traces, we show up to 31% reduction in cache misses. When a unified data/instruction cache architecture is considered, our results show an average improvement of 14.5% for the Powerstone benchmarks and an average improvement of 15.2% for the SPEC'00 benchmarks.