Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automated cache optimizations using CME driven diagnosis
Proceedings of the 14th international conference on Supercomputing
High-Performance Computer Architecture
High-Performance Computer Architecture
Optimizing instruction cache performance for operating system intensive workloads
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Temporal-Based Procedure Reordering for Improved Instruction Cache Performance
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
A Selective Temporal and Aggressive Spatial Cache System Based on Time Interval
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Code placement using temporal profile information
Code placement using temporal profile information
Simulation of large scale networks III: an improved computational algorithm for round-robin service
Proceedings of the 35th conference on Winter simulation: driving innovation
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Framework for performance analysis of RTOS-enabled embedded systems on FPGA
SPECTS'09 Proceedings of the 12th international conference on Symposium on Performance Evaluation of Computer & Telecommunication Systems
Runtime adaptation: a case for reactive code alignment
Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Task scheduling: A control-theoretical viewpoint for a general and flexible solution
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Modern embedded CPU systems rely on a growing number of software features, but this growth increases the memory footprint and increases the need for efficient instruction and data caches. The embedded operating system will often juggle a changing set tasks in a round-robin fashion, which inevitably results in cache misses due to conflicts between different tasks. Our technique reduces cache misses by continuously monitoring CPU cache misses to grade the performance of running tasks. Through a series of step-wise refinements, our software system tunes the round-robin ordering to find a better temporal sequence for the tasks. This tuning is done dynamically during program execution and hence can adapt to changes in work load or external input stimulus. The benefits of this technique are illustrated using an ARM processor running application benchmarks with different cache organizations and round-robin scheduling techniques.