Using GPU to accelerate a pin-based multi-level cache simulator

Authors:
Wan Han;Gao Xiaopeng;Long Xiang;Chen Xianqin
Affiliations:
Beihang University, Beijing, China;Beihang University, Beijing, China;Beihang University, Beijing, China;Beihang University, Beijing, China
Venue:
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Year:
2010

Citing 9
Cited 0

ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Trace-driven memory simulation: a survey

ACM Computing Surveys (CSUR)
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Time-parallel simulation with approximative state matching

Proceedings of the eighteenth workshop on Parallel and distributed simulation
Characterization of L3 cache behavior of SPECjAppServer2002 and TPC-C

Proceedings of the 19th annual international conference on Supercomputing
Approximate time-parallel cache simulation

WSC '04 Proceedings of the 36th conference on Winter simulation
An efficient single-pass trace compression technique utilizing instruction streams

ACM Transactions on Modeling and Computer Simulation (TOMACS)
PIN: a binary instrumentation tool for computer architecture research and education

WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Cache simulator based on GPU acceleration

Proceedings of the 2nd International Conference on Simulation Tools and Techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Trace-driven simulation methodology is the most widely used method to evaluate the design of future computer memory architecture. Since this methodology demands large amounts of storage and computer time, there is a growing need for simulation methodologies to determine the memory system requirements of emerging workloads in a reasonable amount of time. Several techniques have been proposed to reduce the space that store memory reference and improve the performance of sequential trace-driven simulation. This paper presents the use of binary instrumentation as the memory reference generator and parallel simulation technique that based on the generic graphics processing unit (GPU). One way to achieve fast parallel simulation is to simulate the independent sets of a cache concurrently on different compute resource, but results show that this method is not efficient because of a high correlation of the activity between different sets. To put parallelism to effective use, we show that a multi-configuration simulation in single pass method gains 2.44x performance improvement compared to traditional sequential algorithm.