Fast and lightweight support for nested parallelism on cluster-based embedded many-cores

  • Authors:
  • Andrea Marongiu;Paolo Burgio;Luca Benini

  • Affiliations:
  • DEIS - University of Bologna, Bologna - Italy;DEIS - University of Bologna, Bologna - Italy;DEIS - University of Bologna, Bologna - Italy

  • Venue:
  • DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used -- with a crossbar-like medium inside each cluster and a network-on-chip (NoC) at the global level -- which make memory operations non-uniform (NUMA). Nested parallelism represents a powerful programming abstraction for these architectures, where a first level of parallelism can be used to distribute coarse-grained tasks to clusters, and additional levels of fine-grained parallelism can be distributed to processors within a cluster. This paper presents a lightweight and highly optimized support for nested parallelism on cluster-based embedded many-cores. We assess the costs to enable multi-level parallelization and demonstrate that our techniques allow to extract high degrees of parallelism.