An analytical approach for fast and accurate design space exploration of instruction caches

Authors:
Yun Liang;Tulika Mitra
Affiliations:
Peking University and Advanced Digital Science Center, University of Illinois at Urbana-Champaign, Beijing, P.R. China;National University of Singapore, Singapore
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2013

Citing 24
Cited 0

Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Efficient trace-driven simulation methods for cache performance analysis

ACM Transactions on Computer Systems (TOCS)
Efficiently counting program events with support for on-line queries

ACM Transactions on Programming Languages and Systems (TOPLAS)
Set-associative cache simulation using generalized binomial trees

ACM Transactions on Computer Systems (TOCS)
Trace-driven memory simulation: a survey

ACM Computing Surveys (CSUR)
A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor

Digital Technical Journal
Iterative cache simulation of embedded CPUs with trace stripping

CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Hardware-software co-design of embedded reconfigurable architectures

Proceedings of the 37th Annual Design Automation Conference
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Cache Configuration Exploration on Prototyping Platforms

RSP '03 Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP'03)
A highly configurable cache architecture for embedded systems

Proceedings of the 30th annual international symposium on Computer architecture
Design space exploration of caches using compressed traces

Proceedings of the 18th annual international conference on Supercomputing
Procedure placement using temporal-ordering information: dealing with code size expansion

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Cache optimization for embedded processor cores: An analytical approach

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Comparison of Multiobjective Evolutionary Algorithms: Empirical Results

Evolutionary Computation
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
A one-shot configurable-cache tuner for improved energy and performance

Proceedings of the conference on Design, automation and test in Europe
Cache modeling in probabilistic execution time analysis

Proceedings of the 45th annual Design Automation Conference
Static analysis for fast and accurate design space exploration of caches

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Evaluation techniques for storage hierarchies

IBM Systems Journal
Instruction cache locking using temporal reuse profile

Proceedings of the 47th Design Automation Conference
Improved procedure placement for set associative caches

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Application-specific system-on-chip platforms create the opportunity to customize the cache configuration for optimal performance with minimal chip area. Simulation, in particular trace-driven simulation, is widely used to estimate cache hit rates. However, simulation is too slow to be deployed in design space exploration, especially when there are hundreds of design points and the traces are huge. In this article, we propose a novel analytical approach for design space exploration of instruction caches. Given the program control flow graph (CFG) annotated only with basic block and control flow edge execution counts, we first model the cache states at each point of the CFG in a probabilistic manner. Then, we exploit the structural similarities among related cache configurations to estimate the cache hit rates for multiple cache configurations in one pass. Experimental results indicate that our analysis is 28--2,500 times faster compared to the fastest known cache simulator while maintaining high accuracy (0.2% average error) in estimating cache hit rates for a large set of popular benchmarks. Moreover, compared to a state-of-the-art cache design space exploration technique, our approach achieves 304--8,086 times speedup and saves up to 62% (average 7%) energy for the evaluated benchmarks.