SMA: a self-monitored adaptive cache warm-up scheme for microprocessor simulation
International Journal of Parallel Programming
Statistical sampling of microarchitecture simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Efficient Sampling Startup for SimPoint
IEEE Micro
Efficiently exploring architectural design spaces via predictive modeling
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
NSL-BLRL: Efficient CacheWarmup for Sampled Processor Simulation
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
Efficient architectural design space exploration via predictive modeling
ACM Transactions on Architecture and Code Optimization (TACO)
Finding Stress Patterns in Microprocessor Workloads
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Branch Predictor Warmup for Sampled Simulation through Branch History Matching
Transactions on High-Performance Embedded Architectures and Compilers II
Branch history matching: branch predictor warmup for sampled simulation
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Efficient sampling startup for sampled processor simulation
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Finding extreme behaviors in microprocessor workloads
Transactions on High-Performance Embedded Architectures and Compilers IV
Warm-Up Simulation Methodology for HW/SW Co-Designed Processors
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Current computer architecture research relies heavily on architectural simulation to obtain insight into the cycle-level behavior of modern microarchitectures. Unfortunately, such architectural simulations are extremely time-consuming. Sampling is an often-used technique to reduce the total simulation time. This is achieved by selecting a limited number of samples from a complete benchmark execution. One important issue with sampling, however, is the unknown hardware state at the beginning of each sample. Several approaches have been proposed to address this problem by warming up the hardware state before each sample. This paper presents the boundary line reuse latency (BLRL) which is an accurate and efficient warmup strategy. BLRL considers reuse latencies (between memory references to the same memory location) that cross the boundary line between the pre-sample and the sample to compute the warmup that is required for each sample. This guarantees a nearly perfect warmup state at the beginning of a sample. Our experimental results obtained using detailed processor simulation of SPEC CPU2000 benchmarks show that BLRL significantly outperforms the previously proposed memory reference reuse latency (MRRL) warmup strategy. BLRL achieves a warmup that is only half the warmup for MRRL on average for the same level of accuracy.