Statistical sampling of microarchitecture simulation

Authors:
Roland E. Wunderlich;Thomas F. Wenisch;Babak Falsafi;James C. Hoe
Affiliations:
Computer Architecture Laboratory at Carnegie Mellon, Pittsburgh, PA;Computer Architecture Laboratory at Carnegie Mellon, Pittsburgh, PA;Computer Architecture Laboratory at Carnegie Mellon, Pittsburgh, PA;Computer Architecture Laboratory at Carnegie Mellon, Pittsburgh, PA
Venue:
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Year:
2006

Citing 21
Cited 8

Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems

IEEE Transactions on Computers
A model for estimating trace-sample miss ratios

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Cold-start vs. warm-start miss ratios

Communications of the ACM
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches

IEEE Transactions on Computers
Reducing State Loss For Effective Trace Sampling of Superscalar Processors

ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Representative Traces for Processor Models with Infinite Cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
On the Predictability of Program Behavior Using Different Input Data Sets

INTERACT '02 Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures
Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Automatic Synthesis of High-Speed Processor Simulators

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation

The Computer Journal
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation

ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
SimFlex: Statistical Sampling of Computer System Simulation

IEEE Micro
The Strong correlation Between Code Signatures and Performance

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox

IEEE Micro
Efficient sampling startup for sampled processor simulation

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers

SimFlex: Statistical Sampling of Computer System Simulation

IEEE Micro
Empirical performance assessment using soft-core processors on reconfigurable hardware

Proceedings of the 2007 workshop on Experimental computer science
Empirical performance assessment using soft-core processors on reconfigurable hardware

ecs'07 Experimental computer science on Experimental computer science
A Simulation Framework for Rapid Analysis of Reconfigurable Computing Systems

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
COREMU: a scalable and portable parallel full-system emulator

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Synchronization for hybrid MPSoC full-system simulation

Proceedings of the 49th Annual Design Automation Conference
Fast architecture evaluation of heterogeneous MPSoCs by host-compiled simulation

Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
Hybrid simulation for extensible processor cores

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. This article presents the Sampling Microarchitecture Simulation (SMARTS) framework as an approach to enable fast and accurate performance measurements of full-length benchmarks. SMARTS accelerates simulation by selectively measuring in detail only an appropriate benchmark subset. SMARTS prescribes a statistically sound procedure for configuring a systematic sampling simulation run to achieve a desired quantifiable confidence in estimates.Analysis of the SPEC CPU2000 benchmark suite shows that CPI and energy per instruction (EPI) can be estimated to within ±3% with 99.7% confidence by measuring fewer than 50 million instructions per benchmark. In practice, inaccuracy in microarchitectural state initialization introduces an additional uncertainty which we empirically bound to ∼2% for the tested benchmarks. Our implementation of SMARTS achieves an actual average error of only 0.64% on CPI and 0.59% on EPI for the tested benchmarks, running with average speedups of 35 and 60 over detailed simulation of 8-way and 16-way out-of-order processors, respectively.