A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Faster and Better: A Machine Learning Approach to Corner Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
OpenMP extensions for heterogeneous architectures
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Supporting OpenMP on a multi-cluster embedded MPSoC
Microprocessors & Microsystems
From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture
Computing in Science and Engineering
Proceedings of the 49th Annual Design Automation Conference
Fast and lightweight support for nested parallelism on cluster-based embedded many-cores
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Heterogeneous architectures based on one fast-clocked, moderately multicore "host" processor plus a many-core accelerator represent one promising way to satisfy the ever-increasing GOps/W requirements of embedded systems-on-chip. However, heterogeneous computing comes at the cost of increased programming complexity, requiring major rewrite of the applications with low-level programming style (e.g, OpenCL). In this paper we present a programming model, compiler and runtime system for a prototype board from STMicroelectronics featuring a ARM9 host and a STHORM many-core accelerator. The programming model is based on OpenMP, with additional directives to efficiently program the accelerator from a single host program. The proposed multi-ISA compilation toolchain hides all the process of outlining an accelerator program, compiling and loading it to the STHORM platform and implementing data sharing between the host and the accelerator. Our experimental results show that we achieve very close performance to hand-optimized OpenCL codes, at a significantly lower programming complexity.