Workload Design: Selecting Representative Program-Input Pairs
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
The Accuracy of Initial Prediction in Two-Phase Dynamic Binary Translators
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Instruction Based Memory Distance Analysis and its Application
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Statistical sampling of microarchitecture simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Evaluating the correspondence between training and reference workloads in SPEC CPU2006
ACM SIGARCH Computer Architecture News
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Evaluating iterative optimization across 1000 datasets
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Deconstructing iterative optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Smaller input data sets such as the test and the train input sets are commonly used in simulation to estimate the impact of architecture/micro-architecture features on the performance of SPEC benchmarks. They are also used for profile feedback compiler optimizations. In this paper, we examine the reliability of reduced input sets for performance simulation and profile feedback optimizations. We study the high level metrics such as IPC and procedure level profiles as well as lower level measurements such as execution paths exercised by various input sets on the SPEC2000int benchmark. Our study indicates that the test input sets are not suitable to be used for simulation because they do not have an execution profile similar to the reference input runs. The train data set is better than the test data sets at maintaining similar profiles to the reference input set. However, the observed execution paths leading to cache misses are very different between using the smaller input sets and the reference input sets. For current profile based optimizations, the differences in quality of profiles may not have a significant impact on performance, as tested on the Itanium processor with Intel compiler. However, we believe the impact of profile quality will be greater for more aggressive profile guided optimizations, such as cache prefetching.