Efficient instruction cache simulation and execution profiling with a threaded-code interpreter
Proceedings of the 29th conference on Winter simulation
Implementation aspects of a SPARC V9 complete machine simulator
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
ACM Transactions on Embedded Computing Systems (TECS)
SimICS/sun4m: a virtual workstation
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Hi-index | 0.00 |
We describe a method for performance analysis of large software systems that combines a fast instruction-set simulator with off-line detailed analysis of segments of the execution. The combination is faster than straight cycle- accurate simulation, simpler and more flexible than techniques relying on hardware monitoring, and accurate. Specifically, the instruction-set simulator, running at a slowdown of around 50, maintains enough target state throughout the execution that an arbitrarily collected segment of the instruction trace is sufficient input for a post- processing, cycle-accurate model of the processor and memory hierarchy. We present a case study to support our contention that a reduced state is sufficient as input to a cycle-accurate simulator. We use a commercial M88110-based prototype system as a reference point, and show that for three trace segments, the cycle-accurate post-processing gives reliable data to do system optimization.