A sample-based cache mapping scheme

Authors:
Rong Xu;Zhiyuan Li
Affiliations:
Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN
Venue:
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Year:
2005

Citing 12
Cited 1

A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Run-time adaptive cache hierarchy management via reference analysis

Proceedings of the 24th annual international symposium on Computer architecture
Utilizing reuse information in data cache management

ICS '98 Proceedings of the 12th international conference on Supercomputing
Run-Time Cache Bypassing

IEEE Transactions on Computers
A framework for reducing the cost of instrumented code

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
An efficient profile-analysis framework for data-layout optimizations

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches

IEEE Transactions on Computers
Improving Data Locality by Array Contraction

IEEE Transactions on Computers
Using cache mapping to improve memory performance handheld devices

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Page mapping for heterogeneously partitioned caches: Complexity and heuristics

Journal of Embedded Computing - Cache exploitation in embedded systems

Fast, accurate design space exploration of embedded systems memory configurations

Proceedings of the 2007 ACM symposium on Applied computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Applications running on the StrongARM SA-1110 or XScale processor cores can specify cache mapping for each virtual page to achieve better cache utilization. In this work, we describe a method to efficiently perform cache mapping. Under this scheme, we select a number of loops for sampling. These loops are selected automatically based on clock profiling information. We formulate the optimal cache mapping problem as an Integer Linear Programming (ILP) problem. Experiments performed on 14 test programs show speedups in 13 of them (over the default mapping) after applying our sample-based cache mapping scheme. The geometric mean of program speedups for all the 14 test programs is 1.098. Furthermore, compared with a previous heuristic method which uses the full memory trace, the sample-based method performs cache mapping faster by an order of magnitude without sacrificing the quality of mapping.