Dhrystone: a synthetic systems programming benchmark
Communications of the ACM
Measuring Experimental Error in Microprocessor Simulation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Rapid Exploration of Pipelined Processors through Automatic Generation of Synthesizable RTL Models
RSP '03 Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP'03)
Early analysis tools for system-on-a-chip design
IBM Journal of Research and Development
Improved automatic testcase synthesis for performance model validation
Proceedings of the 19th annual international conference on Supercomputing
Automatic performance model construction for the fast software exploration of new hardware designs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Calibration of abstract performance models for system-level design space exploration
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
IBM Journal of Research and Development
Functional verification of the POWER4 microprocessor and POWER4 multiprocessor systems
IBM Journal of Research and Development
Automatic calibration of performance models on heterogeneous multicore architectures
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Hi-index | 0.00 |
Performance models are typically written by hand for a new model or assembled piece-meal from the prior simulation code of an old model. In either case, many man-months of work may be required to write the new model and validate design details against a prior or current design. In reality, the majority of information about the performance of the design already exists in the design structure of either the old hardware model or the new model or both. To harvest this information and eliminate the significant duplicate coding and validation efforts, we propose that a performance model be automatically synthesized from a prior or current hardware design using a bottom-up, design-oriented approach. We demarcate the performance-critical boundaries of the design and perform backward-trace cone analysis to identify logic to include in the performance model. We then abstract specific components for design changes and expend modeling effort only on the few functions relevant to a particular design study. Engineering effort then becomes focused on workload selection and quality, defining and projecting new designs, and assessing design tradeoffs and sensitivities - the small set of tasks with the highest potential to improve design performance. We present a case-study that shows that even the simplest proposed transformations on a high-performance IBM L2 cache design result in a simulation speedup of 3.9, with evidence that an order of magnitude speedup can be obtained using a few additional modeling abstractions.