ACM Transactions on Computer Systems (TOCS)
The Stack Growth Function: Cache Line Reference Models
IEEE Transactions on Computers
The impact of code density on instruction cache performance
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Cost-effective design of application specific VLIW processors using the SCARCE framework
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Fast instruction cache performance evaluation using compile-time analysis
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A Model of Workloads and its Use in Miss-Rate Prediction for Fully Associative Caches
IEEE Transactions on Computers
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Expected I-cache miss rates via the gap model
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A probabilistic method for calculating hit ratios in direct mapped caches
Journal of Network and Computer Applications
An Analytical Model for Designing Memory Hierarchies
IEEE Transactions on Computers
Application-driven synthesis of core-based systems
ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Empirical models of miss rates
Parallel Computing
Automated design of finite state machine predictors for customized processors
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Morphable Cache Architectures: Potential Benefits
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Data remapping for design space optimization of embedded memory systems
ACM Transactions on Embedded Computing Systems (TECS)
Embedded Computing: New Directions in Architecture and Automation
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
ACM Transactions on Embedded Computing Systems (TECS)
Dynamic on-chip memory management for chip multiprocessors
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Balancing design options with Sherpa
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Optimal topology exploration for application-specific 3D architectures
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Fast, accurate design space exploration of embedded systems memory configurations
Proceedings of the 2007 ACM symposium on Applied computing
Reducing complexity of multiobjective design space exploration in VLIW-based embedded systems
ACM Transactions on Architecture and Code Optimization (TACO)
The shape of the processor design space and its implications for early stage explorations
ACMOS'05 Proceedings of the 7th WSEAS international conference on Automatic control, modeling and simulation
Hi-index | 0.00 |
Automation is the key to the design of future embedded systems as it permits application-specific customization while keeping design costs low. A key problem faced by automatic design systems is evaluating the performance of the vast number of alternative designs in a timely manner. For this paper, we focus on an embedded system consisting of the following components: a VLIW processor, instruction cache, data cache, and second-level unified cache. A hierarchical approach of partitioning the system into its constituent components and evaluating each component individually is utilized. The performance of each processor is evaluated independent of its memory hierarchy, and each of the caches is simulated using the traces from a single reference processor. Since the changes in the processor architecture do indeed affect the address traces and thus the performance of the memory hierarchy, the overall performance is inaccurate. To overcome this error, the changes in the processor architecture are modeled as a dilation of the reference processor's address trace, where each instruction block in the trace is conceptually stretched out by the dilation coefficient. This approach provides a projected cache performance that more accurately accounts for changes in the processor architecture. In order to understand the accuracy of the dilation model, we separate the possible errors that the model introduces and quantify these errors on a set of benchmarks. The results show the dilation model is effective for most of the design space and facilitates efficient automatic design.