Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Set-associative cache simulation using generalized binomial trees
ACM Transactions on Computer Systems (TOCS)
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Code placement techniques for cache miss rate reduction
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
Procedure placement using temporal-ordering information: dealing with code size expansion
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Cache optimization for embedded processor cores: An analytical approach
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Optimizing instruction cache performance of embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
StatCache: a probabilistic approach to efficient and accurate data locality analysis
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Static analysis for fast and accurate design space exploration of caches
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Instruction cache locking inside a binary rewriter
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Instruction cache locking using temporal reuse profile
Proceedings of the 47th Design Automation Conference
WCET-driven cache-aware code positioning
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
WCET-centric partial instruction cache locking
Proceedings of the 49th Annual Design Automation Conference
Real-time implementation and performance optimization of 3D sound localization on GPUs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
An analytical approach for fast and accurate design space exploration of instruction caches
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
The performance of most embedded systems is critically dependent on the memory hierarchy performance. In particular, higher cache hit rate can provide significant performance boost to an embedded application. Procedure placement is a popular technique that aims to improve instruction cache hit rate by reducing conflicts in the cache through compile/link time reordering of procedures. However, existing procedure placement techniques make reordering decisions based on imprecise conflict information. This imprecision leads to limited and sometimes negative performance gain, specially for set-associative caches. In this paper, we introduce intermediate blocks profile (IBP) to accurately but compactly model cost-benefit of procedure placement for both direct mapped and set associative caches. We propose an efficient algorithm that exploits IBP to place procedures in memory such that cache conflicts are minimized. Experimental results demonstrate that our approach provides substantial improvement in cache performance over existing procedure placement techniques. Furthermore, we observe that the code layout for a specific cache configuration is not portable across different cache configurations. To solve this problem, we propose an algorithm that exploits IBP to place procedures in memory such that the average cache miss rate across a set of cache configurations is minimized.