Improved procedure placement for set associative caches

Authors:
Yun Liang;Tulika Mitra
Affiliations:
Advanced Digital Sciences Center, Illinois at Singapore, Singapore, Singapore;National University of Singapore, Singapore, Singapore
Venue:
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Year:
2010

Citing 21
Cited 4

Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Set-associative cache simulation using generalized binomial trees

ACM Transactions on Computer Systems (TOCS)
Trace-driven memory simulation: a survey

ACM Computing Surveys (CSUR)
Efficient procedure mapping using cache line coloring

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Code placement techniques for cache miss rate reduction

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
Procedure placement using temporal-ordering information

ACM Transactions on Programming Languages and Systems (TOPLAS)
I-CoPES: fast instruction code placement for embedded systems to improve performance and energy efficiency

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Predicting whole-program locality through reuse distance analysis

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A highly configurable cache architecture for embedded systems

Proceedings of the 30th annual international symposium on Computer architecture
Procedure placement using temporal-ordering information: dealing with code size expansion

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Cache optimization for embedded processor cores: An analytical approach

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Optimizing instruction cache performance of embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
StatCache: a probabilistic approach to efficient and accurate data locality analysis

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Static analysis for fast and accurate design space exploration of caches

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Instruction cache locking inside a binary rewriter

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Instruction cache locking using temporal reuse profile

Proceedings of the 47th Design Automation Conference

WCET-driven cache-aware code positioning

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
WCET-centric partial instruction cache locking

Proceedings of the 49th Annual Design Automation Conference
Real-time implementation and performance optimization of 3D sound localization on GPUs

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
An analytical approach for fast and accurate design space exploration of instruction caches

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of most embedded systems is critically dependent on the memory hierarchy performance. In particular, higher cache hit rate can provide significant performance boost to an embedded application. Procedure placement is a popular technique that aims to improve instruction cache hit rate by reducing conflicts in the cache through compile/link time reordering of procedures. However, existing procedure placement techniques make reordering decisions based on imprecise conflict information. This imprecision leads to limited and sometimes negative performance gain, specially for set-associative caches. In this paper, we introduce intermediate blocks profile (IBP) to accurately but compactly model cost-benefit of procedure placement for both direct mapped and set associative caches. We propose an efficient algorithm that exploits IBP to place procedures in memory such that cache conflicts are minimized. Experimental results demonstrate that our approach provides substantial improvement in cache performance over existing procedure placement techniques. Furthermore, we observe that the code layout for a specific cache configuration is not portable across different cache configurations. To solve this problem, we propose an algorithm that exploits IBP to place procedures in memory such that the average cache miss rate across a set of cache configurations is minimized.