Cache performance for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Analysis of Shared Memory Misses and Reference Patterns
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Design and Implementation of aWorkload Specific Simulator
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
Hi-index | 0.01 |
This paper presents Cerberus, an efficient system for simulating the execution of shared-memory multiprocessor programs on a uniprocessor workstation. Using EDS (execution driven simulation), it generates address traces which can be used to drive cache simulations on the fly, eliminating the large disk space requirements needed by trace files. It is fast because it links the program to be traced together with the cache or statistics gathering tool into a single executable, which eliminates the context-switching needed by communicating processes. It is flexible because it has a simple interface which allows users to easily add any kind of module to use the generated trace information. It compares favorably to other existing tracers; it runs on a commonly available workstation. And it is accurate, allowing cycle-by-cycle interactions between the simulated processors. The resulting slowdown from Cerberus is approximately 31 in uniprocessor mode and 45-50 in multiprocessor mode relative to the workloads run natively on the same machines. We demonstrate that EDS uses only 5 percent of the total execution cycles when combined with a cache simulator and show that EDS is just as efficient as using trace driven simulation.