Low-cost and energy-efficient distributed synchronization for embedded multiprocessors

Authors:
Chenjie Yu;Peter Petrov
Affiliations:
Department of Electrical and, Computer Engineering, University of Maryland, College Park, MD;Department of Electrical and, Computer Engineering, University of Maryland, College Park, MD
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2010

Citing 12
Cited 1

SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
PARLAK: Parametrized Lock Cache Generator

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Fast Barriers for Scalable ccNUMA Systems

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
The M5 Simulator: Modeling Networked Systems

IEEE Micro
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors

Proceedings of the 17th ACM Great Lakes symposium on VLSI
Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures

Proceedings of the 34th annual international symposium on Computer architecture
Efficiency and scalability of barrier synchronization on NoC based many-core architectures

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Efficient synchronization for embedded on-chip multiprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

HARS: A hardware-assisted runtime software for embedded many-core architectures

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a framework for a distributed and lowcost implementation of synchronization mechanisms for embedded shared-memory multiprocessors. The proposed architecture effectively implements the queued-lock semantics in a completely decentralized manner through low-cost and distributed synchronization controllers performing distributed synchronization management protocols. The proposed approach achieves three major benefits. First, it completely eliminates the overwhelming bus contention traffic when multiple cores compete for a synchronization variable. Second, it exhibits extremely low best-case latency of lock acquisition (with zero bus transactions). Third, the approach enables multiple venues for high energy efficiency as the local synchronization controllers can efficiently determine, without any bus transactions or local cache spinning, the exact timing of when a lock is made available to or a barrier enabled at the local processor. It becomes possible for the system software or the thread library to employ various low-power policies.