A hypercube algorithm for the 0/1 knapsack problem
Journal of Parallel and Distributed Computing
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Unified management of registers and cache using liveness and cache bypass
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Introduction to algorithms
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Locality Analysis for Distributed Shared-Memory Multiprocessors
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Using the Compiler to Improve Cache Replacement Decisions
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Data cache locking for higher program predictability
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Instruction cache locking inside a binary rewriter
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Instruction cache locking using temporal reuse profile
Proceedings of the 47th Design Automation Conference
Link-time optimization for power efficiency in a tagless instruction cache
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Data cache in embedded systems plays the roles of both speeding up program execution and reducing power consumption. However, a hardware-only cache management scheme usually results in unsatisfactory cache utilization. In several new architectures, cache management details are accessible at instruction level, enabling the involvement of compiler for better cache performance. In particular, Intel XScale implemented the cache-locking mechanism, which enables the compiler to lock certain critical data in the cache and it is guaranteed that the locked data will not be evicted from the cache. In such an architecture, what-to-lock and when-to-lock are important issues to achieve good cache performance. To this end, this paper gives a 0/1 knapsack problem formulation, which can be efficiently solved using a dynamic programming algorithm. We implemented this formulation in the MIPSpro compiler and our approach reduces both execution time and power consumption. The measured power and performance on an Xscale processor show that our method can achieve improved execution time than data prefetching at similar or reduced power consumption.