Energy-optimal synchronization primitives for single-chip multi-processors

Authors:
Cesare Ferri;Ruth Iris Bahar;Mirko Loghi;Massimo Poncino
Affiliations:
Brown University, Providence, RI, USA;Brown University, Providence, RI, USA;Politecnico di Torino, Torino, Italy;Politecnico di Torino, Torino, Italy
Venue:
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Year:
2009

Citing 7
Cited 1

SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Exploiting Barriers to Optimize Power Consumption of CMPs

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
MPARM: Exploring the Multi-Processor SoC Design Space with SystemC

Journal of VLSI Signal Processing Systems
Synchronization-driven dynamic speed scaling for MPSoCs

Proceedings of the 2006 international symposium on Low power electronics and design
On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors

Proceedings of the 17th ACM Great Lakes symposium on VLSI

Synchronizing code execution on ultra-low-power embedded multi-channel signal analysis platforms

Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

Synchronization among tasks accounts for a sizable fraction of the energy consumption and execution time of applications running on Multi-Processor Systems-on-Chips platforms. In order to achieve fast and energy-efficient operations, it is therefore essential to implement efficient and power-frugal synchronization primitives. The design of such primitives is complicated by several software and hardware issues, such as: processors running at different speeds, different implementations of the waiting phase upon entering the critical section, and the ratio between static and dynamic power. In this work, we compare a set of classical implementations (i.e., based on busy waiting, or on sleep states) of mutex semaphores, and propose a hybrid (wait/sleep) semaphore in which the sleep state is entered only after a number of busywait cycles. The proposed scheme provides the best overall energy-delay product with respect to previously proposed schemes. Furthermore, we identify an optimal length of the busy-wait cycles, which is empirically shown to depend on the time required to switch from the sleep to the active state.