Statistical sampling of microarchitecture simulation

Authors:
Thomas F. Wenisch;Roland E. Wunderlich;Babak Falsafi;James C. Hoe
Affiliations:
Computer Architecture Laboratory, Carnegie Mellon University, Pittsburgh, PA;Computer Architecture Laboratory, Carnegie Mellon University, Pittsburgh, PA;Computer Architecture Laboratory, Carnegie Mellon University, Pittsburgh, PA;Computer Architecture Laboratory, Carnegie Mellon University, Pittsburgh, PA
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 17
Cited 2

Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems

IEEE Transactions on Computers
Fast out-of-order processor simulation using memoization

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
HLS: combining statistical and symbolic simulation to guide microprocessor designs

Proceedings of the 27th annual international symposium on Computer architecture
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Reducing State Loss For Effective Trace Sampling of Superscalar Processors

ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Modeling Superscalar Processors via Statistical Simulation

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Improving the Accuracy vs. Speed Tradeoff for Simulating Shared-Memory Multiprocessors with ILP Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Proceedings of the 30th annual international symposium on Computer architecture
Picking Statistically Valid and Early Simulation Points

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
EXPERT: expedited simulation exploiting program behavior repetition

Proceedings of the 18th annual international conference on Supercomputing
Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies

Proceedings of the 31st annual international symposium on Computer architecture
SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation

ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
Intrinsic Checkpointing: A Methodology for Decreasing Simulation Time Through Binary Modification

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Enhancing Multiprocessor Architecture Simulation Speed Using Matched-Pair Comparison

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Efficient sampling startup for sampled processor simulation

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers

Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations

IEEE Transactions on Computers
Predicting the performance measures of an optical distributed shared memory multiprocessor by using support vector regression

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. The Sampling Microarchitecture Simulation (SMARTS) framework is an approach to enable fast and accurate performance measurements of full-length benchmarks. SMARTS accelerates simulation by selectively measuring in detail only an appropriate benchmark subset. SMARTS prescribes a statistically sound procedure for configuring a systematic sampling simulation run to achieve a desired quantifiable confidence in estimates. Analysis of the SPEC CPU2000 benchmark suite shows that CPI can be estimated to within ± 3% with 99.7% confidence by measuring fewer than 50 million instructions per benchmark. In practice, inaccuracy in microarchitectural state initialization introduces an additional uncertainty which we empirically bound to ∼2% for the tested benchmarks. We present two implementations of SMARTS that both achieve an average error of only 0.64% on CPI. SMARTSim constructs accurate model state through functional warming-continuously warming large microarchitectural structures (e.g., caches and the branch predictor) while functionally simulating the billions of instructions between measurements-reducing average simulation turnaround from 5.5 days to 7.0 hours. TurboSMARTSim replaces functional warming with live-points-checkpoints that store a bare minimum of functionally-warmed state for accurate simulation of a limited execution window-further reducing average turnaround to 91 seconds.