GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs

Authors:
Jose L. Abellán;Juan Fernández;Manuel E. Acacio
Affiliations:
-;-;-
Venue:
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Year:
2011

Citing 0
Cited 4

Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Leveraging hardware message passing for efficient thread synchronization

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.02

Visualization

Abstract

Synchronization is of paramount importance to exploit thread-level parallelism on many-core CMPs. In these architectures, synchronization mechanisms usually rely on shared variables to coordinate multithreaded access to shared data structures thus avoiding data dependency conflicts. Lock synchronization is known to be a key limitation to performance and scalability. On the one hand, lock acquisition through busy waiting on shared variables generates additional coherence activity which interferes with applications. On the other hand, lock contention causes serialization which results in performance degradation. This paper proposes and evaluates \textit{GLocks}, a hardware-supported implementation for highly-contended locks in the context of many-core CMPs. \textit{GLocks} use a token-based message-passing protocol over a dedicated network built on state-of-the-art technology. This approach skips the memory hierarchy to provide a non-intrusive, extremely efficient and fair lock implementation with negligible impact on energy consumption or die area. A comprehensive comparison against the most efficient shared-memory-based lock implementation for a set of micro benchmarks and real applications quantifies the goodness of \textit{GLocks}. Performance results show an average reduction of 42% and 14% in execution time, an average reduction of 76% and 23% in network traffic, and also an average reduction of 78% and 28% in energy-delay$^2$ product (ED$^2$P) metric for the full CMP for the micro benchmarks and the real applications, respectively. In light of our performance results, we can conclude that \textit{GLocks} satisfy our initial working hypothesis. \textit{GLocks} minimize cache-coherence network traffic due to lock synchronization which translates into reduced power consumption and execution time.