ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Time-parallel simulation with approximative state matching
Proceedings of the eighteenth workshop on Parallel and distributed simulation
Characterization of L3 cache behavior of SPECjAppServer2002 and TPC-C
Proceedings of the 19th annual international conference on Supercomputing
Approximate time-parallel cache simulation
WSC '04 Proceedings of the 36th conference on Winter simulation
An efficient single-pass trace compression technique utilizing instruction streams
ACM Transactions on Modeling and Computer Simulation (TOMACS)
PIN: a binary instrumentation tool for computer architecture research and education
WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Cache simulator based on GPU acceleration
Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Hi-index | 0.00 |
Trace-driven simulation methodology is the most widely used method to evaluate the design of future computer memory architecture. Since this methodology demands large amounts of storage and computer time, there is a growing need for simulation methodologies to determine the memory system requirements of emerging workloads in a reasonable amount of time. Several techniques have been proposed to reduce the space that store memory reference and improve the performance of sequential trace-driven simulation. This paper presents the use of binary instrumentation as the memory reference generator and parallel simulation technique that based on the generic graphics processing unit (GPU). One way to achieve fast parallel simulation is to simulate the independent sets of a cache concurrently on different compute resource, but results show that this method is not efficient because of a high correlation of the activity between different sets. To put parallelism to effective use, we show that a multi-configuration simulation in single pass method gains 2.44x performance improvement compared to traditional sequential algorithm.