SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
MPARM: Exploring the Multi-Processor SoC Design Space with SystemC
Journal of VLSI Signal Processing Systems
Synchronization-driven dynamic speed scaling for MPSoCs
Proceedings of the 2006 international symposium on Low power electronics and design
On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Synchronizing code execution on ultra-low-power embedded multi-channel signal analysis platforms
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Synchronization among tasks accounts for a sizable fraction of the energy consumption and execution time of applications running on Multi-Processor Systems-on-Chips platforms. In order to achieve fast and energy-efficient operations, it is therefore essential to implement efficient and power-frugal synchronization primitives. The design of such primitives is complicated by several software and hardware issues, such as: processors running at different speeds, different implementations of the waiting phase upon entering the critical section, and the ratio between static and dynamic power. In this work, we compare a set of classical implementations (i.e., based on busy waiting, or on sleep states) of mutex semaphores, and propose a hybrid (wait/sleep) semaphore in which the sleep state is entered only after a number of busywait cycles. The proposed scheme provides the best overall energy-delay product with respect to previously proposed schemes. Furthermore, we identify an optimal length of the busy-wait cycles, which is empirically shown to depend on the time required to switch from the sleep to the active state.