GCSim: A GPU-Based Trace-Driven Simulator for Multi-level Cache

Authors:
Han Wan;Xiaopeng Gao;Xiang Long;Zhiqiang Wang
Affiliations:
State Key Laboratory of Virtual Reality Technology and System, School of Computer Science and Engineering, Beihang University, Beijing, China 100191;State Key Laboratory of Virtual Reality Technology and System, School of Computer Science and Engineering, Beihang University, Beijing, China 100191;State Key Laboratory of Virtual Reality Technology and System, School of Computer Science and Engineering, Beihang University, Beijing, China 100191;State Key Laboratory of Virtual Reality Technology and System, School of Computer Science and Engineering, Beihang University, Beijing, China 100191
Venue:
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Year:
2009

Citing 13
Cited 1

Trace-driven memory simulation: a survey

ACM Computing Surveys (CSUR)
Stack Evaluation of Arbitrary Set-Associative Multiprocessor Caches

IEEE Transactions on Parallel and Distributed Systems
Using modern graphics architectures for general-purpose computing: a framework and analysis

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Analysis of cache replacement-algorithms

Analysis of cache replacement-algorithms
Time-parallel simulation with approximative state matching

Proceedings of the eighteenth workshop on Parallel and distributed simulation
Approximate time-parallel cache simulation

WSC '04 Proceedings of the 36th conference on Winter simulation
An efficient single-pass trace compression technique utilizing instruction streams

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Linear algebra operators for GPU implementation of numerical algorithms

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Locality-improved FFT implementation on a graphics processor

ISCGAV'07 Proceedings of the 7th WSEAS International Conference on Signal Processing, Computational Geometry & Artificial Vision
Fast scan algorithms on graphics processors

Proceedings of the 22nd annual international conference on Supercomputing
A game loop architecture for the GPU used as a math coprocessor in real-time applications

Computers in Entertainment (CIE) - SPECIAL ISSUE: Media Arts
Real-time Reyes-style adaptive surface subdivision

ACM SIGGRAPH Asia 2008 papers
Evaluation techniques for storage hierarchies

IBM Systems Journal

A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the design of parallel trace-driven cache simulation for the purposes of evaluating different cache structures. As the research goes deeper, traditional simulation methods, which can only execute simulation operations in sequence, are no longer practical due to their long simulation cycles. An obvious way to achieve fast parallel simulation is to simulate the independent sets of a cache concurrently on different compute resources. We considered the use of generic GPU to accelerate cache simulation which exploits set-partitioning as the main source of parallelism. But we show this technique is not efficient in the case that just simulating one cache configuration, since a high correlation of the activity between different sets. Trace-sort and multi-configuration simulation in one single pass techniques are developed, taking advantage of the full programmability offered by the Compute Unified Device Architecture (CUDA) on the GPU. Our experimental results demonstrate that the cache simulator based on GPU-CPU platform gains 2.44x performance improvement compared to traditional sequential algorithm.