Speeding-up synchronizations in DSM multiprocessors

Authors:
A. de Dios;B. Sahelices;P. Ibáñez;V. Viñals;J. M. Llabería
Affiliations:
Dpto. de Informática, Univ. de Valladolid;Dpto. de Informática, Univ. de Valladolid;Dpto. de Informática e Ing. de Sistemas, I3A and HiPEAC, Univ. de Zaragoza;Dpto. de Informática e Ing. de Sistemas, I3A and HiPEAC, Univ. de Zaragoza;Dpto. de Arquitectura de Computadores, Univ. Polit. de Cataluña
Venue:
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Year:
2006

Citing 17
Cited 0

Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Synchronization Algorithms for Shared-Memory Multiprocessors

Computer
Scalable coherent interface

Computer
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
The Stanford Dash Multiprocessor

Computer
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Architecture and design of AlphaServer GS320

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Queue Locks on Cache Coherent Multiprocessors

Proceedings of the 8th International Symposium on Parallel Processing
Inferential queueing and speculative push for reducing critical communication latencies

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Ocean warning: avoid drowning

ACM SIGARCH Computer Architecture News
Mechanisms for efficient shared-memory, lock-based synchronization

Mechanisms for efficient shared-memory, lock-based synchronization
The Impact of Negative Acknowledgments in Shared Memory Scientific Applications

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Synchronization in parallel programs is a major performance bottleneck. Shared data is protected by locks and a lot of time is spent in the competition arising at the lock hand-off. In this period of time, a large amount of traffic is targeted to the line holding the lock variable. In order to be serialized, the requests to the same cache line can either be bounced (NACKed) or buffered in the coherence controller. In this paper we focus on systems whose coherence controllers buffer requests. During lock hand-off only the requests from the winning processor contribute to the computation progress, because the winning processor is the only one that will advance the work. This key observation leads us to propose a hardware mechanism named Request Bypass, which allows requests from the winning processor to bypass the requests buffered in the home coherence controller keeping the lock line. The mechanism does not require compiler or programmer support nor ISA or coherence protocol changes. By simulating a 32 processor system we show that Request Bypass reduces execution time and lock stall time up to 35% and 75%, respectively. The programs limited by synchronization benefit the most from Request Bypass.