Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
Two algorithms for barrier synchronization
International Journal of Parallel Programming
Data distribution support on distributed shared memory multiprocessors
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Extending OpenMP for NUMA machines
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Efficient Barriers for Distributed Shared Memory Computers
Proceedings of the 8th International Symposium on Parallel Processing
Performance of Cluster-enabled OpenMP for the SCASH Software Distributed Shared Memory System
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Analyzing On-Chip Communication in a MPSoC Environment
Proceedings of the conference on Design, automation and test in Europe - Volume 2
The OpenMP Source Code Repository
PDP '05 Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing
OpenUH: an optimizing, portable OpenMP compiler: Research Articles
Concurrency and Computation: Practice & Experience - Current Trends in Compilers for Parallel Computers (CPC2006)
Effective OpenMP Implementation and Translation For Multiprocessor System-On-Chip without Using OS
ASP-DAC '07 Proceedings of the 2007 Asia and South Pacific Design Automation Conference
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Efficiency and scalability of barrier synchronization on NoC based many-core architectures
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
IEEE Transactions on Parallel and Distributed Systems
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Implementing OpenMP on a high performance embedded multicore MPSoC
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Prototype design of cluster-based homogeneous multiprocessor system-on-chip
ASID'09 Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication
A practical OpenMP compiler for system on chips
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Exploring programming model-driven QoS support for NoC-based platforms
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
An OpenMP Compiler for Efficient Use of Distributed Scratchpad Memory in MPSoCs
IEEE Transactions on Computers
Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP
Proceedings of the First International Workshop on Many-core Embedded Systems
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
The ever-increasing complexity of MPSoCs is putting the production of software on the critical path in embedded system development. Several programming models and tools have been proposed in the recent past that aim to facilitate application development for embedded MPSoCs. OpenMP is a mature and easy-to-use standard for shared memory programming, which has recently been successfully adopted in embedded MPSoC programming as well. To achieve performance, however, it is necessary that the implementation of OpenMP constructs efficiently exploits the many peculiarities of MPSoC hardware, and that custom features are provided to the programmer to control it. In this paper we consider a representative template of a modern multi-cluster embedded MPSoC and present an extensive evaluation of the cost associated with supporting OpenMP on such a machine, investigating several implementation variants that are aware of the memory hierarchy and of the heterogeneous interconnection.