Integer and combinatorial optimization
Integer and combinatorial optimization
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler optimizations for eliminating cache conflict misses
Compiler optimizations for eliminating cache conflict misses
Augmenting Loop Tiling with Data Alignment for Improved Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Analysis of high-level address code transformations for programmable processors
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Proceedings of the 14th international symposium on Systems synthesis
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Memory Organization for Improved Data Cache Performance in Embedded Processors
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Heterogeneously tagged caches for low-power embedded systems with virtual memory support
ACM Transactions on Design Automation of Electronic Systems (TODAES)
YAARC: yet another approach to further reducing the rate of conflict misses
The Journal of Supercomputing
Tiny split data-caches make big performance impact for embedded applications
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Microprocessors & Microsystems
Data locality and parallelism optimization using a constraint-based approach
Journal of Parallel and Distributed Computing
Optimizing data locality using array tiling
Proceedings of the International Conference on Computer-Aided Design
A data layout optimization framework for NUCA-based multicores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A comparative analysis of performance improvement schemes for cache memories
Computers and Electrical Engineering
Loop acceleration exploration for ASIP architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 14.98 |
Cache misses form a major bottleneck for real-time multimedia applications due to the off-chip accesses to the main memory. This results in both a major access bandwidth overhead (and related power consumption) as well as performance penalties. In this paper, we propose a new technique for organizing data in the main memory for data dominated multimedia applications so as to reduce the majority of the conflict cache misses. The focus of this paper is on the formal and heuristic algorithm we use to steer the data layout decisions and the experimental results obtained using a prototype tool. Experiments on real-life demonstrators illustrate that we are able to reduce up to 82 percent of the conflict misses for applications which are already aggressively transformed at source-level. At the same time, we also reduce the off-chip data accesses by up to 78 percent. In addition, we are able to reduce up to 20 percent more conflict misses compared to existing techniques.