A first look at the interplay of code reordering and configurable caches

Authors:
Ann Gordon-Ross;Frank Vahid;Nikil Dutt
Affiliations:
University of California, Riverside, CA;University of California, Riverside, CA;University of California, Irvine, CA
Venue:
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Year:
2005

Citing 18
Cited 4

Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Cache design trade-offs for power and performance optimization: a case study

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A low power unified cache architecture providing power and performance flexibility (poster session)

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Alto: a link-time optimizer for the Compaq alpha

Software—Practice & Experience
Cache Configuration Exploration on Prototyping Platforms

RSP '03 Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP'03)
A highly configurable cache architecture for embedded systems

Proceedings of the 30th annual international symposium on Computer architecture
Code Reorginazation for Instruction Caches

Code Reorginazation for Instruction Caches
Code placement using temporal profile information

Code placement using temporal profile information
Automatic Tuning of Two-Level Caches to Embedded Applications

Proceedings of the conference on Design, automation and test in Europe - Volume 1
A Self-Tuning Cache Architecture for Embedded Systems

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Instrumentation and optimization of Win32/intel executables using Etch

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Spike: an optimizer for alpha/NT executables

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Reducing startup latency in web and desktop applications

WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3

On the interplay of loop caching, code compression, and cache configuration

Proceedings of the 16th Asia and South Pacific Design Automation Conference
Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC

Journal of Parallel and Distributed Computing
Instruction cache locking for multi-task real-time embedded systems

Real-Time Systems
Instruction Cache Locking for Embedded Systems using Probability Profile

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The instruction cache is a popular target for optimizations of microprocessor-based systems because of the cache's high impact on system performance and power, and because of the cache's predictable temporal and spatial locality. Optimization techniques can be designed based on this predictability. We explore for the first time the interplay of two popular instruction cache optimization techniques: the long-known technique of code reordering and the relatively-new technique of cache configuration. We address the question of whether those two optimizations complement each other or if one optimization dominates the other. Through experiments using embedded system benchmarks, we show that cache configuration dominates a particular category of code reordering techniques with respect to optimizing performance and energy, obviating the need for reordering. We also examine the modern scenario of synthesized custom caches, and show that combining cache configuration with code reordering results in cache size reductions of 13% on average, and up to 89% in some benchmarks, beyond just cache configuration alone.