Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
A portable global optimizer and linker
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Efficient trace-driven simulation methods for cache performance analysis
ACM Transactions on Computer Systems (TOCS)
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches
IEEE Transactions on Computers
Exploiting dual data-memory banks in digital signal processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An Analytical Model for Designing Memory Hierarchies
IEEE Transactions on Computers
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation
IEEE Transactions on Computers
Iterative cache simulation of embedded CPUs with trace stripping
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Memory exploration for low power, embedded systems
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Automatic and efficient evaluation of memory hierarchies for embedded systems
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Co-design of interleaved memory systems
CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
A recursive algorithm for low-power memory partitioning
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Polynomial-time algorithm for on-chip scratchpad memory partitioning
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
EMBARC: an efficient memory bank assignment algorithm for retargetable compilers
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A sample-based cache mapping scheme
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Architecture exploration for efficient data transfer and storage in data-parallel applications
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Hi-index | 0.00 |
The memory hierarchy is often a critical component of an embedded system. An embedded system's memory hierarchy can have dramatic impact on the overall cost, performance, and power consumption of the system. Consequently, designers spend considerable time evaluating potential memory system designs. Unfortunately, the range of options in the memory hierarchy (e.g., number, size, and type of caches, on-chip SRAM, DRAM, EPROM, etc.) makes thorough exploration of the design space using typical simulation techniques infeasible. This paper describes a fast, accurate technique to estimate an application's average memory latency on a set of memory hierarchies. The technique is fast---two orders of magnitude faster than a full simulation. It is also accurate---extensive measurements show that 70% of the estimates were within 1 percentage point of the actual cycle count while over 99% of all estimates were within 10 percentage points of the actual cycle count. This fast, accurate technique provides the embedded system designer the ability to more fully explore the design space of potential memory hierarchies and select the one that best meets the system's design requirements.