Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Stack Evaluation of Arbitrary Set-Associative Multiprocessor Caches
IEEE Transactions on Parallel and Distributed Systems
Using modern graphics architectures for general-purpose computing: a framework and analysis
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Analysis of cache replacement-algorithms
Analysis of cache replacement-algorithms
Time-parallel simulation with approximative state matching
Proceedings of the eighteenth workshop on Parallel and distributed simulation
Approximate time-parallel cache simulation
WSC '04 Proceedings of the 36th conference on Winter simulation
An efficient single-pass trace compression technique utilizing instruction streams
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Linear algebra operators for GPU implementation of numerical algorithms
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Locality-improved FFT implementation on a graphics processor
ISCGAV'07 Proceedings of the 7th WSEAS International Conference on Signal Processing, Computational Geometry & Artificial Vision
Fast scan algorithms on graphics processors
Proceedings of the 22nd annual international conference on Supercomputing
A game loop architecture for the GPU used as a math coprocessor in real-time applications
Computers in Entertainment (CIE) - SPECIAL ISSUE: Media Arts
Real-time Reyes-style adaptive surface subdivision
ACM SIGGRAPH Asia 2008 papers
Using GPU to accelerate a pin-based multi-level cache simulator
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
CIPARSim: cache intersection property assisted rapid single-pass FIFO cache simulation technique
Proceedings of the International Conference on Computer-Aided Design
DIMSim: a rapid two-level cache simulation approach for deadline-based MPSoCs
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Hi-index | 0.00 |
Cache technology plays a fundamental role in modern computer systems as it serves the purpose of matching the speed gap between processor and memory. Trace-driven simulator has been widely adopted in the process of design and evaluation of cache architectures. However, as the cache design moves to more complicated architectures, size of the trace is becoming larger and larger. Traditional simulation methods, which can only execute simulation operations in sequence, are no longer practical due to their long simulation cycles. In this paper, we explore both set-parallelism and search-parallelism in cache simulation process, and map our parallel algorithm to GPU-CPU platform. And we propose a trace-driven cache simulator on GPU using Compute Unified Device Architecture (CUDA). Our experimental result shows that the new algorithm gains 2.5x performance improvement compared to traditional CPU-based serial algorithm.