Automatic and efficient evaluation of memory hierarchies for embedded systems

Authors:
Santosh G. Abraham;Scott A. Mahlke
Affiliations:
Hewlett-Packard Laboratories, Palo Alto, CA;Hewlett-Packard Laboratories, Palo Alto, CA
Venue:
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Year:
1999

Citing 13
Cited 14

An analytical cache model

ACM Transactions on Computer Systems (TOCS)
The Stack Growth Function: Cache Line Reference Models

IEEE Transactions on Computers
The impact of code density on instruction cache performance

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Cost-effective design of application specific VLIW processors using the SCARCE framework

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Fast instruction cache performance evaluation using compile-time analysis

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches

IEEE Transactions on Computers
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Expected I-cache miss rates via the gap model

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A probabilistic method for calculating hit ratios in direct mapped caches

Journal of Network and Computer Applications
An Analytical Model for Designing Memory Hierarchies

IEEE Transactions on Computers
Application-driven synthesis of core-based systems

ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Empirical models of miss rates

Parallel Computing

Automated design of finite state machine predictors for customized processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Morphable Cache Architectures: Potential Benefits

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Cool-cache for hot multimedia

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Embedded Computer Architecture and Automation

Computer
PICO: Automatically Designing Custom Computers

Computer
Data remapping for design space optimization of embedded memory systems

ACM Transactions on Embedded Computing Systems (TECS)
Embedded Computing: New Directions in Architecture and Automation

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Cool-Cache: A compiler-enabled energy efficient data caching framework for embedded/multimedia processors

ACM Transactions on Embedded Computing Systems (TECS)
Dynamic on-chip memory management for chip multiprocessors

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Balancing design options with Sherpa

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Optimal topology exploration for application-specific 3D architectures

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Fast, accurate design space exploration of embedded systems memory configurations

Proceedings of the 2007 ACM symposium on Applied computing
Reducing complexity of multiobjective design space exploration in VLIW-based embedded systems

ACM Transactions on Architecture and Code Optimization (TACO)
The shape of the processor design space and its implications for early stage explorations

ACMOS'05 Proceedings of the 7th WSEAS international conference on Automatic control, modeling and simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automation is the key to the design of future embedded systems as it permits application-specific customization while keeping design costs low. A key problem faced by automatic design systems is evaluating the performance of the vast number of alternative designs in a timely manner. For this paper, we focus on an embedded system consisting of the following components: a VLIW processor, instruction cache, data cache, and second-level unified cache. A hierarchical approach of partitioning the system into its constituent components and evaluating each component individually is utilized. The performance of each processor is evaluated independent of its memory hierarchy, and each of the caches is simulated using the traces from a single reference processor. Since the changes in the processor architecture do indeed affect the address traces and thus the performance of the memory hierarchy, the overall performance is inaccurate. To overcome this error, the changes in the processor architecture are modeled as a dilation of the reference processor's address trace, where each instruction block in the trace is conceptually stretched out by the dilation coefficient. This approach provides a projected cache performance that more accurately accounts for changes in the processor architecture. In order to understand the accuracy of the dilation model, we separate the possible errors that the model introduces and quantify these errors on a set of benchmarks. The results show the dilation model is effective for most of the design space and facilitates efficient automatic design.