Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Efficient trace-driven simulation methods for cache performance analysis
ACM Transactions on Computer Systems (TOCS)
Efficiently counting program events with support for on-line queries
ACM Transactions on Programming Languages and Systems (TOPLAS)
Set-associative cache simulation using generalized binomial trees
ACM Transactions on Computer Systems (TOCS)
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor
Digital Technical Journal
Iterative cache simulation of embedded CPUs with trace stripping
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Hardware-software co-design of embedded reconfigurable architectures
Proceedings of the 37th Annual Design Automation Conference
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Cache Configuration Exploration on Prototyping Platforms
RSP '03 Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP'03)
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
Design space exploration of caches using compressed traces
Proceedings of the 18th annual international conference on Supercomputing
Procedure placement using temporal-ordering information: dealing with code size expansion
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Cache optimization for embedded processor cores: An analytical approach
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Comparison of Multiobjective Evolutionary Algorithms: Empirical Results
Evolutionary Computation
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
A one-shot configurable-cache tuner for improved energy and performance
Proceedings of the conference on Design, automation and test in Europe
Cache modeling in probabilistic execution time analysis
Proceedings of the 45th annual Design Automation Conference
Static analysis for fast and accurate design space exploration of caches
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Evaluation techniques for storage hierarchies
IBM Systems Journal
Instruction cache locking using temporal reuse profile
Proceedings of the 47th Design Automation Conference
Improved procedure placement for set associative caches
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Hi-index | 0.00 |
Application-specific system-on-chip platforms create the opportunity to customize the cache configuration for optimal performance with minimal chip area. Simulation, in particular trace-driven simulation, is widely used to estimate cache hit rates. However, simulation is too slow to be deployed in design space exploration, especially when there are hundreds of design points and the traces are huge. In this article, we propose a novel analytical approach for design space exploration of instruction caches. Given the program control flow graph (CFG) annotated only with basic block and control flow edge execution counts, we first model the cache states at each point of the CFG in a probabilistic manner. Then, we exploit the structural similarities among related cache configurations to estimate the cache hit rates for multiple cache configurations in one pass. Experimental results indicate that our analysis is 28--2,500 times faster compared to the fastest known cache simulator while maintaining high accuracy (0.2% average error) in estimating cache hit rates for a large set of popular benchmarks. Moreover, compared to a state-of-the-art cache design space exploration technique, our approach achieves 304--8,086 times speedup and saves up to 62% (average 7%) energy for the evaluated benchmarks.