Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
A model for estimating trace-sample miss ratios
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Efficient simulation of caches under optimal replacement with applications to miss characterization
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Effectiveness of trace sampling for performance debugging tools
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation
IEEE Transactions on Computers
IEEE Transactions on Computers
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
IEEE Transactions on Computers
Reducing State Loss For Effective Trace Sampling of Superscalar Processors
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Accuracy and Speedup of Parallel Trace-Driven Architectural Simulation
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Accelerating Architectural Simulation by Parallel Execution of Trace Samples
Accelerating Architectural Simulation by Parallel Execution of Trace Samples
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation
ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
Hi-index | 0.00 |
Architectural simulations of microprocessors are extremely time-consuming nowadays due to the ever increasing complexity of current applications. In order to get realistic workloads on current hardware, benchmarks need to be constructed with huge dynamic instruction counts. For example, SPEC released the CPU2000 benchmark suite containing benchmarks that have a dynamic instruction count of several hundreds of billions of instructions. This is beneficial for real hardware evaluation. However, simulating these workloads is impractical if not impossible if we take into account that many simulation runs are needed in order to evaluate a large number of design points. Trace sampling is often used as a practical solution for this problem. In trace sampling, several representative samples are chosen from a real program trace. Since the sampled trace is much shorter than the original trace, a significant simulation speedup is obtained. In this paper, we study what is the optimal sample size to achieve a given level of accuracy while maximizing the total simulation speedup. From various experiments using SPEC CPU2000, we conclude that the optimal sample length (i) is not fixed over benchmarks, and (ii) increases with increasing warmup lengths. As such, we propose an algorithm that determines the optimal sample length per benchmark under different warmup scenarios. This is done within the context of sampled cache simulation.