Using SimPoint for accurate and efficient simulation
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Characterizing and Comparing Prevailing Simulation Techniques
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Statistical Simulation of Multithreaded Architectures
MASCOTS '05 Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
A co-phase matrix to guide simultaneous multithreading simulation
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Motivation for Variable Length Intervals and Hierarchical Phase Behavior
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Detecting phases in parallel applications on shared memory architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Multiprocessor performance estimation using hybrid simulation
Proceedings of the 45th annual Design Automation Conference
Multi-granularity sampling for simulating concurrent heterogeneous applications
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Simulating chip-multiprocessor systems (CMP) can take a long time. For single-threaded workloads, earlier work has shown the utility of phase analysis, that is identification of repetitive program behaviors, in reducing overall simulation time while maintaining an acceptable loss in accuracy. To cope with multithreaded workloads, a combination of phases from all executing threads must be taken into consideration since inter-thread interference may distort the homogeneity of each phases' true performance. Unfortunately, phase analysis does not work for multithreaded (MT) workloads because the possible phase combinations in an inherently nondeterministic execution model grows exponentially with the number of threads. To this end, we propose a new technique to reduce the number of simulation samples by synthesizing samples from similar phase combinations. We present a simple cost function for measuring the similarity between phase combinations and by using the individual thread samples from the similar phase combinations, a new sample can be constructed. This cost function provides a convenient control knob for exploiting tradeoffs between simulation speed and accuracy. Our experimental results show that in most cases, properly setting the cost function's threshold can yield a reduction in sampling by 90%, while maintaining error to less than 5%.