Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Alto: a link-time optimizer for the Compaq alpha
Software—Practice & Experience
Cache Configuration Exploration on Prototyping Platforms
RSP '03 Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP'03)
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
Code Reorginazation for Instruction Caches
Code Reorginazation for Instruction Caches
Code placement using temporal profile information
Code placement using temporal profile information
Automatic Tuning of Two-Level Caches to Embedded Applications
Proceedings of the conference on Design, automation and test in Europe - Volume 1
A Self-Tuning Cache Architecture for Embedded Systems
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Instrumentation and optimization of Win32/intel executables using Etch
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Spike: an optimizer for alpha/NT executables
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Reducing startup latency in web and desktop applications
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
On the interplay of loop caching, code compression, and cache configuration
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC
Journal of Parallel and Distributed Computing
Instruction cache locking for multi-task real-time embedded systems
Real-Time Systems
Instruction Cache Locking for Embedded Systems using Probability Profile
Journal of Signal Processing Systems
Hi-index | 0.00 |
The instruction cache is a popular target for optimizations of microprocessor-based systems because of the cache's high impact on system performance and power, and because of the cache's predictable temporal and spatial locality. Optimization techniques can be designed based on this predictability. We explore for the first time the interplay of two popular instruction cache optimization techniques: the long-known technique of code reordering and the relatively-new technique of cache configuration. We address the question of whether those two optimizations complement each other or if one optimization dominates the other. Through experiments using embedded system benchmarks, we show that cache configuration dominates a particular category of code reordering techniques with respect to optimizing performance and energy, obviating the need for reordering. We also examine the modern scenario of synthesized custom caches, and show that combining cache configuration with code reordering results in cache size reductions of 13% on average, and up to 89% in some benchmarks, beyond just cache configuration alone.