Fast and lightweight support for nested parallelism on cluster-based embedded many-cores

Authors:
Andrea Marongiu;Paolo Burgio;Luca Benini
Affiliations:
DEIS - University of Bologna, Bologna - Italy;DEIS - University of Bologna, Bologna - Italy;DEIS - University of Bologna, Bologna - Italy
Venue:
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Year:
2012

Citing 7
Cited 5

Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors

ICS '99 Proceedings of the 13th international conference on Supercomputing
Space-efficient scheduling of nested parallelism

ACM Transactions on Programming Languages and Systems (TOPLAS)
OpenMP Extensions for Thread Groups and Their Run-Time Support

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Performance Evaluation of OpenMP Applications with Nested Parallelism

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
A microbenchmark study of OpenMP overheads under nested parallelism

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Nested parallelism in the OMPI OpenmP/C compiler

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Variation-tolerant OpenMP tasking on tightly-coupled processor clusters

Proceedings of the Conference on Design, Automation and Test in Europe
Enabling fine-grained OpenMP tasking on tightly-coupled shared memory clusters

Proceedings of the Conference on Design, Automation and Test in Europe
ARTM: a lightweight fork-join framework for many-core embedded systems

Proceedings of the Conference on Design, Automation and Test in Europe
Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP

Proceedings of the First International Workshop on Many-core Embedded Systems
HARS: A hardware-assisted runtime software for embedded many-core architectures

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used -- with a crossbar-like medium inside each cluster and a network-on-chip (NoC) at the global level -- which make memory operations non-uniform (NUMA). Nested parallelism represents a powerful programming abstraction for these architectures, where a first level of parallelism can be used to distribute coarse-grained tasks to clusters, and additional levels of fine-grained parallelism can be distributed to processors within a cluster. This paper presents a lightweight and highly optimized support for nested parallelism on cluster-based embedded many-cores. We assess the costs to enable multi-level parallelization and demonstrate that our techniques allow to extract high degrees of parallelism.