Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The future of multiprocessor systems-on-chips
Proceedings of the 41st annual Design Automation Conference
NoC Synthesis Flow for Customized Domain Specific Multiprocessor Systems-on-Chip
IEEE Transactions on Parallel and Distributed Systems
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Distributed and low-power synchronization architecture for embedded multiprocessors
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Journal of Parallel and Distributed Computing
Energy and throughput efficient transactional memory for embedded multicore systems
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
The advances in semiconductor technologies have placed MPSoCscenter stage as a standard architecture for embedded applications of ever increasing complexity. Efficient utilization of the ample hardware resources requires applications to be decomposed into fine-grained threads, engendering in turn a large amount of interprocessor communications. While fine-grained on-chip interconnects can reduce the data transfer overhead, the traditional synchronization mechanisms, such as spin locks and barriers, still cause significant contention in polling shared variables. To overcome this issue, in this paper we propose a light-weight distributed synchronization mechanism which statically encodes the semantically correct order of accesses to each shared variable. A sharp reduction in the number of code bits is attained through a reference coloring algorithm, which furthermore enables an implementation within negligible hardware overhead. This light-weight synchronization mechanism allows dependent threads to frequently exchange data during execution, in turn enabling the exploration of fine-grained parallelism for applications with complex dependences.