Effectiveness of trace sampling for performance debugging tools
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Measuring Experimental Error in Microprocessor Simulation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Reducing State Loss For Effective Trace Sampling of Superscalar Processors
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Accuracy and Speedup of Parallel Trace-Driven Architectural Simulation
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Modeling Superscalar Processors via Statistical Simulation
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Code Cloning Tracing: A ``Pay per Trace'' Approach
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Framework for Statistical Modeling of Superscalar Processor Performance
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Processor Modeling for Hardware Software Codesign
VLSID '99 Proceedings of the 12th International Conference on VLSI Design - 'VLSI for the Information Appliance'
Let's Study Whole-Program Cache Behaviour Analytically
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Performance analysis through synthetic trace generation
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
VHC: Quickly Building an Optimizer for Complex Embedded Architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
EXPERT: expedited simulation exploiting program behavior repetition
Proceedings of the 18th annual international conference on Supercomputing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Accelerated warmup for sampled microarchitecture simulation
ACM Transactions on Architecture and Code Optimization (TACO)
A multinomial clustering model for fast simulation of computer architecture designs
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Design and Implementation of aWorkload Specific Simulator
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
Branch Predictor Warmup for Sampled Simulation through Branch History Matching
Transactions on High-Performance Embedded Architectures and Compilers II
Branch history matching: branch predictor warmup for sampled simulation
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
parSC: synchronous parallel systemc simulation on multi-core host architectures
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Hi-index | 0.00 |
While architecture simulation is often treated as a methodology issue, it is at the core of most processor architecture research works, and simulation speed is often the bottleneck of the typical trial-and-error research process. To speedup simulation during this research process and get trends faster, researchers usually reduce the trace size. More sophisticated techniques like trace sampling or distributed simulation are scarcely used because they are considered unreliable and complex due to their impact on accuracy and the associated warm-up issues.In this article, we present DiST, a practical distributed simulation scheme where, unlike in other simulation techniques that trade accuracy for speed, the user is relieved from most accuracy issues thanks to an automatic and dynamic mechanism for adjusting the warm-up interval size. Moreover, the mechanism is designed so as to always privilege accuracy over speedup. The speedup scales with the amount of available computing resources, bringing an average 7.35 speedup on 10 machines with an average IPC error of 1.81% and a maximum IPC error of 5.06%.Besides proposing a solution to the warm-up issues in distributed simulation, we experimentally show that our technique is significantly more accurate than trace size reduction or trace sampling for identical speedups. We also show that not only the error always remains small for IPC and other metrics, but that a researcher can reliably base research decisions on DiST simulation results. Finally, we explain how the DiST tool is designed to be easily pluggable into existing architecture simulators with very few modifications.