Design and Implementation of aWorkload Specific Simulator
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
Hi-index | 0.00 |
This paper describes our distributed architectural simulatorof shared memory multiprocessors named Shaman.The simulator runs on a PC cluster that consists of multiplefront-end nodes to simulate the instruction level behaviorof a target multiprocessor in parallel and a back-endnode to simulate the target memory system. The front-endalso simulates the logical behavior of the shared memoryusing software DSM technique and generates memory referencesto drive the back-end. A remarkable feature of oursimulator is the reference filtering to reduce the amount ofthe references transferred from the front-end to the back-endutilizing the DSM mechanism and coherent cache simulationon the front-end. This technique and our sophisticatedDSM implementation discussed in this paper givean extraordinary performance to the Shaman simulator. Weachieved 335 million and 392 million simulation clock persecond for LU decomposition and FFT in SPLASH-2 kernelbenchmarks respectively, when we used 16 front-end nodesto simulate a 16-way target SMP.