Procedure placement using temporal-ordering information: Dealing with code size expansion

Authors:
Christophe Guillon;Fabrice Rastello;Thierry Bidault;Florent Bouchez
Affiliations:
STMicroelectronics in France;École Normale Supérieure de Lyon in France (Corresponding author. E-mail: fabrice.rastello@ens-lyons.fr);STMicroelectronics in France;École Normale Supérieure de Lyon in France
Venue:
Journal of Embedded Computing - Cache exploitation in embedded systems
Year:
2005

Citing 10
Cited 8

Introduction to algorithms

Introduction to algorithms
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Efficient procedure mapping using cache line coloring

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Linear and Time Minimum-Cost Matching Algorithms for Quasi-Convex Tours

SIAM Journal on Computing
Efficiency of a Good But Not Linear Set Union Algorithm

Journal of the ACM (JACM)
Procedure placement using temporal-ordering information

ACM Transactions on Programming Languages and Systems (TOPLAS)
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Temporal-Based Procedure Reordering for Improved Instruction Cache Performance

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Algorithmic Graph Theory and Perfect Graphs (Annals of Discrete Mathematics, Vol 57)

Algorithmic Graph Theory and Perfect Graphs (Annals of Discrete Mathematics, Vol 57)

Optimal task placement to improve cache performance

EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software
Techniques and tools for implementing IEEE 754 floating-point arithmetic on VLIW integer processors

Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC

Journal of Parallel and Distributed Computing
WCET-driven branch prediction aware code positioning

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors

ACM Transactions on Embedded Computing Systems (TECS)
Instruction cache locking for multi-task real-time embedded systems

Real-Time Systems
WCET-centric partial instruction cache locking

Proceedings of the 49th Annual Design Automation Conference
Instruction Cache Locking for Embedded Systems using Probability Profile

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a direct-mapped instruction cache, all instructions that have the same memory address modulo the cache size share a common and unique cache slot. Instruction cache conflicts can be partially handled at linked time by procedure placement. Pettis and Hansen give in [1] an algorithm that reorders procedures in memory by aggregating them in a greedy fashion. The Gloy and Smith algorithm [2] greatly decreases the number of conflict-misses but increases the code size by allowing gaps between procedures. The latter contains two main stages: the cache-placement phase assigns modulo addresses to minimizes cache-conflicts; the memory-placement phase assigns final memory addresses under the modulo placement constraints, and minimizes the code size expansion. In this paper: (1) we prove the NP-completeness of the cache-placement problem; (2) we provide an optimal algorithm to the memory-placement problem with complexity O(nmin(n,L)ack(n)) (n is the number of procedures, L the cache size, α is the inverse Ackermann's function that is lower than 4 in practice); (3) we take final program size into consideration during the cache-placement phase. Our modifications to the Gloy and Smith algorithm gives on average a code size expansion of 8% over the original program size, while the initial algorithm gave an expansion of 177%. The cache miss reduction is nearly the same as the Gloy and Smith solution with 35% cache miss reduction.