Fast, accurate design space exploration of embedded systems memory configurations

Authors:
Jason D. Hiser;Jack W. Davidson;David B. Whalley
Affiliations:
University of Virginia, Charlottesville, VA;University of Virginia, Charlottesville, VA;Flordia State University, Tallahasse, FL
Venue:
Proceedings of the 2007 ACM symposium on Applied computing
Year:
2007

Citing 19
Cited 1

Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems

IEEE Transactions on Computers
A portable global optimizer and linker

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Efficient trace-driven simulation methods for cache performance analysis

ACM Transactions on Computer Systems (TOCS)
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches

IEEE Transactions on Computers
Exploiting dual data-memory banks in digital signal processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An Analytical Model for Designing Memory Hierarchies

IEEE Transactions on Computers
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation

IEEE Transactions on Computers
Iterative cache simulation of embedded CPUs with trace stripping

CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Memory exploration for low power, embedded systems

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Automatic and efficient evaluation of memory hierarchies for embedded systems

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Co-design of interleaved memory systems

CODES '00 Proceedings of the eighth international workshop on Hardware/software codesign
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
A recursive algorithm for low-power memory partitioning

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Polynomial-time algorithm for on-chip scratchpad memory partitioning

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
EMBARC: an efficient memory bank assignment algorithm for retargetable compilers

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A sample-based cache mapping scheme

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

Architecture exploration for efficient data transfer and storage in data-parallel applications

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The memory hierarchy is often a critical component of an embedded system. An embedded system's memory hierarchy can have dramatic impact on the overall cost, performance, and power consumption of the system. Consequently, designers spend considerable time evaluating potential memory system designs. Unfortunately, the range of options in the memory hierarchy (e.g., number, size, and type of caches, on-chip SRAM, DRAM, EPROM, etc.) makes thorough exploration of the design space using typical simulation techniques infeasible. This paper describes a fast, accurate technique to estimate an application's average memory latency on a set of memory hierarchies. The technique is fast---two orders of magnitude faster than a full simulation. It is also accurate---extensive measurements show that 70% of the estimates were within 1 percentage point of the actual cycle count while over 99% of all estimates were within 10 percentage points of the actual cycle count. This fast, accurate technique provides the embedded system designer the ability to more fully explore the design space of potential memory hierarchies and select the one that best meets the system's design requirements.