Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip

  • Authors:
  • Juan del Cuvillo;Weirong Zhu;Guang Gao

  • Affiliations:
  • University of Delaware, Newark, DE;University of Delaware, Newark, DE;University of Delaware, Newark, DE

  • Venue:
  • Proceedings of the 3rd conference on Computing frontiers
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents our experience mapping OpenMP parallel programming model to the IBM Cyclops-64 (C64) architecture. The C64 employs a many-core-on-a-chip design that integrates processing logic (160 thread units), embedded memory (5MB) and communication hardware on the same die. Such a unique architecture presents new opportunities for optimization. Specifically, we consider the following three areas: (1) a memory aware runtime library that places frequently used data structures in scratchpad memory; (2) a unique spin lock algorithm for shared memory synchronization based on in-memory atomic instructions and native support for thread level execution; (3) a fast barrier that directly uses C64 hardware support for collective synchronization. All three optimizations together, result in an 80% overhead reduction for language constructs in OpenMP. We believe that such a drastic reduction in the cost of managing parallelism makes OpenMP more amenable for writing parallel programs on the C64 platform.