Cilk: an efficient multithreaded runtime system
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
Faster and Better: A Machine Learning Approach to Corner Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation of OpenMP task scheduling strategies
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Design and Implementation of OpenMP Tasks in the OMPi Compiler
PCI '11 Proceedings of the 2011 15th Panhellenic Conference on Informatics
Proceedings of the 49th Annual Design Automation Conference
Fast and lightweight support for nested parallelism on cluster-based embedded many-cores
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Cluster-based architectures are increasingly being adopted to design embedded many-cores. These platforms can deliver very high peak performance within a contained power envelope, provided that programmers can make effective use the available parallel cores. This is becoming an extremely difficult task, as embedded applications are growing in complexity and exhibit irregular and dynamic parallelism. The OpenMP tasking extensions represent a powerful abstraction to capture this form of parallelism. However, efficiently supporting it on cluster-based embedded SoCs is not easy, because the fine-grained parallel workload present in embedded applications can not tolerate high memory and run-time overheads. In this paper we present our design of the runtime support layer to OpenMP tasking for an embedded shared memory cluster, identifying key aspects to achieving performance and discussing important architectural support to removing major bottlenecks.