Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
Efficient trace-driven simulation method for cache performance analysis
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Set-associative cache simulation using generalized binomial trees
ACM Transactions on Computer Systems (TOCS)
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation
IEEE Transactions on Computers
Iterative cache simulation of embedded CPUs with trace stripping
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Parallel trace-driven cache simulation by time partitioning
WSC' 90 Proceedings of the 22nd conference on Winter simulation
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Design space exploration of caches using compressed traces
Proceedings of the 18th annual international conference on Supercomputing
Exploring the cache design space for large scale CMPs
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Finding optimal L1 cache configuration for embedded systems
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Cache coherence tradeoffs in shared-memory MPSoCs
ACM Transactions on Embedded Computing Systems (TECS)
An efficient single-pass trace compression technique utilizing instruction streams
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Parallel Cache Simulation on Multiprocessor Workstattions
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Exact and fast L1 cache simulation for embedded systems
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Cache simulator based on GPU acceleration
Proceedings of the 2nd International Conference on Simulation Tools and Techniques
SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
T-SPaCS: a two-level single-pass cache simulation methodology
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Computer Architecture, Fifth Edition: A Quantitative Approach
Computer Architecture, Fifth Edition: A Quantitative Approach
CIPARSim: cache intersection property assisted rapid single-pass FIFO cache simulation technique
Proceedings of the International Conference on Computer-Aided Design
Platune: a tuning framework for system-on-a-chip platforms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
It is of critical importance to satisfy deadline requirements for an embedded application to avoid undesired outcomes. Multiprocessor System-on-Chips (MPSoCs) play a vital role in contemporary embedded devices to satisfy timing deadlines. Such MPSoCs include two-level cache hierarchies which have to be dimensioned carefully to support timing deadlines of the application(s) while consuming minimum area and therefore minimum power. Given the deadline of an application, it is possible to systematically derive the maximum time that could be spent on memory accesses which can then be used to dimension the suitable cache sizes. As the dimensioning has to be done rapidly to satisfy the time to market requirement, we choose a well acclaimed rapid cache simulation strategy, the single-pass trace driven simulation, for estimating the cache dimensions. Therefore, for the first time, we address the two main challenges, coherency and scalability, in adapting a single-pass simulator to a MPSoC with two-level cache hierarchy. The challenges are addressed through a modular bottom-up simulation technique where L1 and L2 simulations are handled in independent communicating modules. In this paper, we present how the dimensioning is performed for a two-level inclusive data cache hierarchy in an MPSoC. With the rapid simulation proposed, the estimations are suggested within an hour (worst case on considered application benchmarks). We experimented our approach with task based MPSoC implementations of JPEG and H264 benchmarks and achieved timing deviations of 16.1% and 7.2% respectively on average against the requested data access times. The deviation numbers are always positive meaning our simulator guarantees to satisfy the requested data access time. In addition, we generated a set of synthetic memory traces and used them to extensively analyse our simulator. For the synthetic traces, our simulator provides cache sizes to always guarantee the requested data access time, deviating below 14.5% on average.