Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol
IEEE Transactions on Parallel and Distributed Systems
Itanium 2 Processor Microarchitecture
IEEE Micro
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
ACM SIGARCH Computer Architecture News
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs
ACM Transactions on Computer Systems (TOCS)
Studying multicore processor scaling via reuse distance analysis
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
In this paper we describe how we have used Pin to generate a multithreaded reference stream for simulation of a multiprocessor on a uniprocessor. We have taken special care to model as accurately as possible the effects of cache coherence protocol state, and lock and barrier synchronization on the performance of multithreaded applications running on multiprocessor hardware.We first describe a simplified version of the algorithm, which uses semaphores to synchronize instrumented application threads and the simulator. We then describe modifications to that algorithm to model the microarchitectural features of the Itanium2 that affect the timing of memory reference issue. An experimental evaluation determines that, while our methods enable accurate simulation, the use of semaphores has negative impact on the performance of the simulator.