Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Linux Journal
GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Stream-Oriented FPGA Computing in the Streams-C High Level Language
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
OpenUH: an optimizing, portable OpenMP compiler: Research Articles
Concurrency and Computation: Practice & Experience - Current Trends in Compilers for Parallel Computers (CPC2006)
Practical fpga programming in c
Practical fpga programming in c
SoC-C: efficient programming abstractions for heterogeneous multicore systems on chip
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
International Journal of Parallel Programming
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluation of Multicore Processors for Embedded Systems by Parallel Benchmark Program Using OpenMP
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Implementing OpenMP on a high performance embedded multicore MPSoC
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
hiCUDA: High-Level GPGPU Programming
IEEE Transactions on Parallel and Distributed Systems
OpenMDSP: Extending OpenMP to Program Multi-Core DSP
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Offload – automating code migration to heterogeneous multicore systems
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Support for OpenMP tasks on cell architecture
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
A compiler and runtime for heterogeneous computing
Proceedings of the 49th Annual Design Automation Conference
Mapping a data-flow programming model onto heterogeneous platforms
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Designing a unified programming model for heterogeneous machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
libEOMP: a portable OpenMP runtime library based on MCA APIs for embedded systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Hi-index | 0.00 |
Multicore embedded systems are being widely used in telecommunication systems, robotics, medical applications and more.While they offer a high-performance with low-power solution, programming in an efficient way is still a challenge. In order to exploit the capabilities that the hardware offers, software developers are expected to handle many of the low-level details of programming including utilizing DMA, ensuring cache coherency, and inserting synchronization primitives explicitly. The state-of-the-art involves solutions where the software toolchain is too vendor-specific thus tying the software to a particular hardware leaving no room-for portability. In this paper we present a runtime system to explore mapping a high-level programming model, OpenMP, on to multicore embedded systems. A key feature of our scheme is that unlike the existing approaches that largely rely on POSIX threads, our approach leverages the Multicore Association (MCA) APIs as an OpenMP translation layer. The MCA APIs is a set of low-level APIs handling resource management, inter-process communications and task scheduling for multicore embedded systems. By deploying the MCA APIs, our runtime is able to effectively capture the characteristics of multicore embedded systems compared with the POSIX threads. Furthermore, the MCA layer enables our runtime implementation to be portable across various architectures. Thus programmers only need to maintain a single OpenMP code base which is compatible by various compilers, while on the other hand, the code is portable across different possible types of platforms. We have evaluated our runtime system using several embedded benchmarks. The experiments demonstrate promising and competitive performance compared to the native approach for the platform.