Accelerated warmup for sampled microarchitecture simulation

Authors:
John W. Haskins, Jr.;Kevin Skadron
Affiliations:
Center for Computing Sciences, Bowie, MD;University of Virginia, Charlottesville, VA
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2005

Citing 19
Cited 6

Mathematical statistics (4th ed.)

Mathematical statistics (4th ed.)
Kendall's advanced theory of statistics

Kendall's advanced theory of statistics
On the Fractal Dimension of Computer Programs and its Application to the Prediction of the Cache Miss Ratio

IEEE Transactions on Computers
A model for estimating trace-sample miss ratios

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
An inter-reference gap model for temporal locality in program behavior

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
On the use of trace sampling for architectural studies of desktop applications

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques

IEEE Transactions on Computers
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Reducing State Loss For Effective Trace Sampling of Superscalar Processors

ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Proceedings of the 30th annual international symposium on Computer architecture
Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Techniques for Accurate, Accelerated Processor Simulation: Analysis of Reduced Inputs and Sampling

Techniques for Accurate, Accelerated Processor Simulation: Analysis of Reduced Inputs and Sampling
Picking Statistically Valid and Early Simulation Points

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation

ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software

Efficient Sampling Startup for SimPoint

IEEE Micro
Finding Stress Patterns in Microprocessor Workloads

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Branch Predictor Warmup for Sampled Simulation through Branch History Matching

Transactions on High-Performance Embedded Architectures and Compilers II
Branch history matching: branch predictor warmup for sampled simulation

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Efficient sampling startup for sampled processor simulation

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Finding extreme behaviors in microprocessor workloads

Transactions on High-Performance Embedded Architectures and Compilers IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

To reduce the cost of cycle-accurate software simulation of microarchitectures, many researchers use statistical sampling: by simulating only a small, representative subset of the end-to-end dynamic instruction stream in cycle-accurate detail, simulation results complete in much less time than simulating the cycle-by-cycle progress of an entire benchmark. In order for sampled simulation results to accurately reflect the nature the full dynamic instruction stream, however, state in the simulated cache and branch predictor must match or closely approximate state as it would have appeared had cycle-accurate simulation been used for the entire simulation. Researchers typically address this issue by prefixing a period of warmup---in which cache and branch predictor state are modeled in addition to programmer-visible architected state---to each cluster of contiguous instructions in the sample.One conservative, but slow approach is to always simulate cache and branch predictor state, whether among the cycle-accurate clusters, or among the instructions preceding each cluster. To save time, warmup heuristics have been proposed, but there is no one-size-fits-all heuristic for any benchmark. More rigorous, analytical warmup approaches are necessary in order to balance the requirements of high accuracy and rapidity from sampled simulations. This paper explores this issue and in particular demonstrates the merits of memory reference reuse latency (MRRL).Relative to the IPC measured by modeling all precluster cache and branch predictor activity, MRRL generated an average error in IPC of less than 1% and simultaneously reduced simulation running times by an average of approximately 50% (or 95% of the maximum potential speedup).