A data cache with multiple caching strategies tuned to different types of locality
ICS '95 Proceedings of the 9th international conference on Supercomputing
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
Run-time adaptive cache hierarchy management via reference analysis
Proceedings of the 24th annual international symposium on Computer architecture
Utilizing reuse information in data cache management
ICS '98 Proceedings of the 12th international conference on Supercomputing
IEEE Transactions on Computers
A framework for reducing the cost of instrumented code
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
An efficient profile-analysis framework for data-layout optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
Improving Data Locality by Array Contraction
IEEE Transactions on Computers
Using cache mapping to improve memory performance handheld devices
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Page mapping for heterogeneously partitioned caches: Complexity and heuristics
Journal of Embedded Computing - Cache exploitation in embedded systems
Fast, accurate design space exploration of embedded systems memory configurations
Proceedings of the 2007 ACM symposium on Applied computing
Hi-index | 0.00 |
Applications running on the StrongARM SA-1110 or XScale processor cores can specify cache mapping for each virtual page to achieve better cache utilization. In this work, we describe a method to efficiently perform cache mapping. Under this scheme, we select a number of loops for sampling. These loops are selected automatically based on clock profiling information. We formulate the optimal cache mapping problem as an Integer Linear Programming (ILP) problem. Experiments performed on 14 test programs show speedups in 13 of them (over the default mapping) after applying our sample-based cache mapping scheme. The geometric mean of program speedups for all the 14 test programs is 1.098. Furthermore, compared with a previous heuristic method which uses the full memory trace, the sample-based method performs cache mapping faster by an order of magnitude without sacrificing the quality of mapping.