Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Procedure merging with instruction caches
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
Eliminating cache conflict misses through XOR-based placement functions
ICS '97 Proceedings of the 11th international conference on Supercomputing
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Guest Editors' Introduction-Cache Memory and Related Problems: Enhancing and Exploiting the Locality
IEEE Transactions on Computers - Special issue on cache memory and related problems
A Trace Cache Microarchitecture and Evaluation
IEEE Transactions on Computers - Special issue on cache memory and related problems
Augmenting Loop Tiling with Data Alignment for Improved Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Randomized Cache Placement for Eliminating Conflicts
IEEE Transactions on Computers - Special issue on cache memory and related problems
Optimizing the Instruction Cache Performance of the Operating System
IEEE Transactions on Computers
A locality sensitive multi-module cache with explicit management
ICS '99 Proceedings of the 13th international conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The ChARM Tool for Tuning Embedded Systems
IEEE Micro
System-on-chip beyond the nanometer wall
Proceedings of the 40th annual Design Automation Conference
An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations
International Journal of Parallel Programming
Hi-index | 0.00 |
The ever-increasing gap between processor and memory speed is an issue also in embedded systems, because of the increased complexity of multimedia elaborations and the strict resource constraints of these devices.Profile-driven code optimization techniques can be effectively employed for tuning application-cache interaction and performances of cache system itself. In fact, applications running on such systems are usually known in advance and do not change over time. In a previous paper, we presented a profile-based code restructuring technique (CAT) that was able to dramatically increase cache exploitation of embedded applications.However, it is well known that profile-driven optimizations can suffer from input-sensitivity problems: an application that is optimized for a particular input can perform even worse than the original one, when subjected other inputs.In this paper we take into account jpeg and mpeg compressor/decompressor applications and analyze the input-sensitivity of CAT improved layouts over a wide range of inputs. The input sets were accurately determined through both black-box and white-box analysis of applications.We propose two metrics for measuring the input-sensitivity of application layouts, and show how our profile-driven code transformation technique is able to reduce the input-sensitivity of the considered applications up to 48% on caches ranging from 1 KByte to 8KByte.