Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
A model for estimating trace-sample miss ratios
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation
IEEE Transactions on Computers
Fast out-of-order processor simulation using memoization
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Direct execution models of processor behavior and performance
WSC '87 Proceedings of the 19th conference on Winter simulation
A universal technique for fast and flexible instruction-set architecture simulation
Proceedings of the 39th annual Design Automation Conference
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
Reducing State Loss For Effective Trace Sampling of Superscalar Processors
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Accuracy and Speedup of Parallel Trace-Driven Architectural Simulation
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Instruction set compiled simulation: a technique for fast and flexible instruction set simulation
Proceedings of the 40th annual Design Automation Conference
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Self-Monitored Adaptive Cache Warm-Up for Microprocessor Simulation
SBAC-PAD '04 Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Characterizing and Comparing Prevailing Simulation Techniques
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Accelerated warmup for sampled microarchitecture simulation
ACM Transactions on Architecture and Code Optimization (TACO)
TurboSMARTS: accurate microarchitecture simulation sampling in minutes
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging
Proceedings of the 32nd annual international symposium on Computer Architecture
BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation
The Computer Journal
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation
ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
Accelerating Multiprocessor Simulation with a Memory Timestamp Record
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Intrinsic Checkpointing: A Methodology for Decreasing Simulation Time Through Binary Modification
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
The Strong correlation Between Code Signatures and Performance
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Automatic logging of operating system effects to guide application-level architecture simulation
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Statistical sampling of microarchitecture simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Efficient Sampling Startup for SimPoint
IEEE Micro
Efficiently exploring architectural design spaces via predictive modeling
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A Sampling Method Focusing on Practicality
IEEE Micro
NSL-BLRL: Efficient CacheWarmup for Sampled Processor Simulation
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
Efficient architectural design space exploration via predictive modeling
ACM Transactions on Architecture and Code Optimization (TACO)
Finding Stress Patterns in Microprocessor Workloads
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Branch Predictor Warmup for Sampled Simulation through Branch History Matching
Transactions on High-Performance Embedded Architectures and Compilers II
Branch history matching: branch predictor warmup for sampled simulation
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Statistical sampling of microarchitecture simulation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Finding extreme behaviors in microprocessor workloads
Transactions on High-Performance Embedded Architectures and Compilers IV
Hi-index | 0.00 |
Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months. Statistical sampling and sample techniques like SimPoint that pick small sets of execution samples have been shown to provide accurate results while significantly reducing simulation time. The inefficiencies in sampling are (a) needing the correct memory image to execute the sample, and (b) needing a warm architecture state when simulating the sample. In this paper we examine efficient Sampling Startup techniques addressing two issues: how to represent the correct memory image during simulation, and how to deal with warmup. Representing the correct memory image ensures the memory values consumed during the sample's simulation are correct. Warmup techniques focus on reducing error due to the architecture state not being fully representative of the complete execution that proceeds the sample to be simulated. This paper presents several Sampling Startup techniques and compares them against previously proposed techniques. The end result is a practical sampled simulation methodology that provides accurate performance estimates of complete benchmark executions in the order of minutes.