Architectural Support for Enhanced SMT Job Scheduling
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
How to use SimPoint to pick simulation points
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Methods for Modeling Resource Contention on Simultaneous Multithreading Processors
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Automatic phase detection for stochastic on-chip traffic generation
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Phase guided sampling for efficient parallel application simulation
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Applying Statistical Sampling for Fast and Efficient Simulation of Commercial Workloads
IEEE Transactions on Computers
Multi-granularity sampling for simulating concurrent heterogeneous applications
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Accelerating multi-core simulators
Proceedings of the 2010 ACM Symposium on Applied Computing
Detecting phases in parallel applications on shared memory architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Selecting representative benchmark inputs for exploring microprocessor design spaces
ACM Transactions on Architecture and Code Optimization (TACO)
PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
Several commercial processors have architectures that include support for simultaneous multithreading (SMT), yet there is still not a validated methodology for estimating the performance of an SMT machine that does not rely on full program simulation. To create an efficient sampling approach for SMT we must determine how far to fast-forward each individual thread between samples. The fast-forwarding distance for each thread will vary according to execution phases, thread interactions and changes to the architectural configuration. We examine using individual program phase information to guide SMT simulation. This is accomplished by creating what we call a co-phase matrix. The co-phase matrix is populated by collecting samples of the programs' phase combinations, and is used to guide fastforwarding between samples. We show for 28 pairs of SPEC programs that using the co-phase matrix provides an average error rate of 4% while requiring that only 1% of the full simulation be performed. The methods are also validated using a variety of architectural configurations and four-threaded workloads.