Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Selective instruction compression for memory energy reduction in embedded systems
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Using cache line coloring to perform aggressive procedure inlining
ACM SIGARCH Computer Architecture News - Special issue on interaction between compilers and computer architectures
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Alto: a link-time optimizer for the Compaq alpha
Software—Practice & Experience
Software Trace Cache for Commercial Applications
International Journal of Parallel Programming
Multi-objective design space exploration using genetic algorithms
Proceedings of the tenth international symposium on Hardware/software codesign
The Effect of Code Reordering on Branch Prediction
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Cache Configuration Exploration on Prototyping Platforms
RSP '03 Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP'03)
Dynamic Loop Caching Meets Preloaded Loop Caching " A Hybrid Approach
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
Code Reorginazation for Instruction Caches
Code Reorginazation for Instruction Caches
Using a Victim Buffer in an Application-Specific Memory Hierarchy
Proceedings of the conference on Design, automation and test in Europe - Volume 1
A Self-Tuning Cache Architecture for Embedded Systems
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Cache Optimization For Embedded Processor Cores: An Analytical Approach
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Profile-directed restructuring of operating system code
IBM Systems Journal
IEEE Transactions on Computers
Code placement for improving dynamic branch prediction accuracy
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Optimizing instruction cache performance of embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example
IEEE Computer Architecture Letters
Fast and efficient partial code reordering: taking advantage of dynamic recompilatior
Proceedings of the 5th international symposium on Memory management
Dynamic code management: improving whole program code locality in managed runtimes
Proceedings of the 2nd international conference on Virtual execution environments
Accurate simulation and evaluation of code reordering
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Code reordering on limited branch offset
ACM Transactions on Architecture and Code Optimization (TACO)
Spike: an optimizer for alpha/NT executables
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Improving instruction locality with just-in-time code layout
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Reducing startup latency in web and desktop applications
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Fast configurable-cache tuning with a unified second-level cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Platune: a tuning framework for system-on-a-chip platforms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
The instruction cache is a popular optimization target due to the cache's high impact on system performance and power and because of the cache's predictable temporal and spatial locality. This article is an in depth study on the interaction of code reordering (a long-known technique) and cache configuration (a relatively new technique). Experimental results show that code reordering coupled with cache configuration reveals additional energy savings as high as 10--15% for several benchmarks with reduced cache area as high as 48%. To exploit these additional benefits, we architect and evaluate several design exploration heuristics for combining these two methods.