Talisman: fast and accurate multicomputer simulation
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Fast out-of-order processor simulation using memoization
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A time-stamping algorithm for efficient performance estimation of superscalar processors
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Facile: a language and compiler for high-performance processor simulators
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Implementation aspects of a SPARC V9 complete machine simulator
ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Cycle Accurate Memory Modelling: A Case-Study in Validation
MASCOTS '05 Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Heterogeneous Chip Multiprocessors
Computer
RSIM: a simulator for shared-memory multiprocessor and uniprocessor systems that exploit ILP
WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education
Hi-index | 0.00 |
This paper presents a novel technique for cycle-accurate simulation of the Central Processing Unit (CPU) of a modern superscalar processor, the UltraSPARC III Cu processor. The technique is based on adding a module to an existing fetch-decode-execute style of CPU simulator, rather than the traditional method of fully modelling the CPU microarchitecture. It is also suitable for accurate SMP modelling. The main functions of the module are the simulation of instruction grouping, register interlocks and the store buffer. Its simple table-driven implementation permits easy modification for exploring microarchitectural variations. The technique results in a 40% loss of simulation speed, instead of a 10 times or greater performance loss by fully implementing the detailed micro-architecture. The technique is validated against an actual UltraSPARC III Cu processor, and achieves high levels of accuracy over a range of scientific benchmarks.