(R) The Impact of Speeding up Critical Sections with Data Prefetching and Forwarding

Authors:
Affiliations:
Venue:
ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Year:
1996

Citing 0
Cited 7

The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

International Journal of Parallel Programming
The Affinity Entry Consistency Protocol

ICPP '97 Proceedings of the international Conference on Parallel Processing
Cache Injection: A Novel Technique for Tolerating Memory Latency in Bus-Based SMPs

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Inferential queueing and speculative push for reducing critical communication latencies

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Inferential queueing and speculative push

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Data-Driven Multithreading Using Conventional Microprocessors

IEEE Transactions on Parallel and Distributed Systems
Predicting Coherence Communication by Tracking Synchronization Points at Run Time

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: While shared-memory multiprocessing offers a simple model for process synchronization, actual synchronization may be expensive. Indeed, processors may have to wait for a long time to acquire the lock of a critical section. In addition, a processor may have to stall for a long time waiting for all of its pending accesses to complete before releasing the lock. To address this problem, we target well-known optimization techniques to specifically speed-up accesses to critical sections. We reduce the time taken by critical sections by applying data prefetching and forwarding to minimize the number of misses inside these sections. In addition, we prefetch and forward data in exclusive mode to reduce the stall time before lock release. Our evaluation shows that a simple prefetching algorithm is able to speed-up parallel applications significantly at a very low cost. With this optimization, five Splash applications run 20% faster on average, while one of them runs 52% faster. We also conclude that more complicated, forward-based optimizations are not justified.