Cache performance of operating system and multiprogramming workloads
ACM Transactions on Computer Systems (TOCS)
Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
A model for estimating trace-sample miss ratios
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Cold-start vs. warm-start miss ratios
Communications of the ACM
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
Reducing State Loss For Effective Trace Sampling of Superscalar Processors
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Representative Traces for Processor Models with Infinite Cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
On the Predictability of Program Behavior Using Different Input Data Sets
INTERACT '02 Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures
Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Automatic Synthesis of High-Speed Processor Simulators
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation
The Computer Journal
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation
ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
The Strong correlation Between Code Signatures and Performance
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Efficient sampling startup for sampled processor simulation
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Empirical performance assessment using soft-core processors on reconfigurable hardware
Proceedings of the 2007 workshop on Experimental computer science
Empirical performance assessment using soft-core processors on reconfigurable hardware
ecs'07 Experimental computer science on Experimental computer science
A Simulation Framework for Rapid Analysis of Reconfigurable Computing Systems
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
COREMU: a scalable and portable parallel full-system emulator
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Synchronization for hybrid MPSoC full-system simulation
Proceedings of the 49th Annual Design Automation Conference
Fast architecture evaluation of heterogeneous MPSoCs by host-compiled simulation
Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
Hybrid simulation for extensible processor cores
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. This article presents the Sampling Microarchitecture Simulation (SMARTS) framework as an approach to enable fast and accurate performance measurements of full-length benchmarks. SMARTS accelerates simulation by selectively measuring in detail only an appropriate benchmark subset. SMARTS prescribes a statistically sound procedure for configuring a systematic sampling simulation run to achieve a desired quantifiable confidence in estimates.Analysis of the SPEC CPU2000 benchmark suite shows that CPI and energy per instruction (EPI) can be estimated to within ±3% with 99.7% confidence by measuring fewer than 50 million instructions per benchmark. In practice, inaccuracy in microarchitectural state initialization introduces an additional uncertainty which we empirically bound to ∼2% for the tested benchmarks. Our implementation of SMARTS achieves an actual average error of only 0.64% on CPI and 0.59% on EPI for the tested benchmarks, running with average speedups of 35 and 60 over detailed simulation of 8-way and 16-way out-of-order processors, respectively.