Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Application scaling under shared virtual memory on a cluster of SMPs
ICS '99 Proceedings of the 13th international conference on Supercomputing
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems
IEEE Design & Test
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
The future of multiprocessor systems-on-chips
Proceedings of the 41st annual Design Automation Conference
Flexible and Formal Modeling of Microprocessors with Application to Retargetable Simulation
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
CODES+ISSS '04 Proceedings of the international conference on Hardware/Software Codesign and System Synthesis: 2004
Instruction level and operating system profiling for energy exposed software
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Trade-offs in the Configuration of a Network on Chip for Multiple Use-Cases
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Efficient synchronization for embedded on-chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Synchronization mechanisms on modern multi-core architectures
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Time analysable synchronisation techniques for parallelised hard real-time applications
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
This paper explores optimization techniques of the synchronization mechanisms for MPSoCs based on complex interconnect (Network-on-Chip), targeted at future mobile systems. We suggest the architecture of the memory controller optimized to minimize synchronization overhead. The proposed solution is based on the idea of performing synchronization operations which require the continuous polling of a shared variable, thus featuring large contention (e.g. spin locks), locally in the memory. We introduce a HW module, which augments the memory controller, the Synchronization-operation Buffer (SB), which queues and manages the requests issued by the processors. Experimental validation has been carried out by using GRAPES, a cycle-accurate performance/power simulation platform. For an 8-processor target architecture, we show that the proposed solution achieves up to 40% performance improvement and 25% energy saving with respect to synchronization based on the caching of the synchronization variables and directory-based coherency protocol. Furthermore, we prove the scalability of the proposed approach when the number of processors increases.