SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
PARLAK: Parametrized Lock Cache Generator
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Fast Barriers for Scalable ccNUMA Systems
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Proceedings of the 34th annual international symposium on Computer architecture
Efficiency and scalability of barrier synchronization on NoC based many-core architectures
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Efficient synchronization for embedded on-chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
HARS: A hardware-assisted runtime software for embedded many-core architectures
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
We present a framework for a distributed and lowcost implementation of synchronization mechanisms for embedded shared-memory multiprocessors. The proposed architecture effectively implements the queued-lock semantics in a completely decentralized manner through low-cost and distributed synchronization controllers performing distributed synchronization management protocols. The proposed approach achieves three major benefits. First, it completely eliminates the overwhelming bus contention traffic when multiple cores compete for a synchronization variable. Second, it exhibits extremely low best-case latency of lock acquisition (with zero bus transactions). Third, the approach enables multiple venues for high energy efficiency as the local synchronization controllers can efficiently determine, without any bus transactions or local cache spinning, the exact timing of when a lock is made available to or a barrier enabled at the local processor. It becomes possible for the system software or the thread library to employ various low-power policies.