Exploiting multilevel parallelism using OpenMP on a massive multithreaded architecture

  • Authors:
  • David Ró/denas;Xavier Martorell;Eduard Ayguadé/;Jesú/s Labarta;George Almá/si;Că/lin Caş/caval;José/ Castañ/os;José/ Moreira

  • Affiliations:
  • Barcelona Supercomputing Center, UPC, Campus Nord - C6, Jordi Girona 1-3, 08034 Barcelona, Spain;(Correspd. Tel.: +34 93 405 40 42/ Fax: +34 93 401 70 55/ E-mail: xavim@ac.upc.edu) Barcelona Supercomputing Center, UPC, Campus Nord - C6, Jordi Girona 1-3, 08034 Barcelona, Spain;Barcelona Supercomputing Center, UPC, Campus Nord - C6, Jordi Girona 1-3, 08034 Barcelona, Spain;Barcelona Supercomputing Center, UPC, Campus Nord - C6, Jordi Girona 1-3, 08034 Barcelona, Spain;IBM T.J. Watson Research Center, 1101 Kitchawan Road, Route 134, Yorktown Heights, NY 10598, USA;IBM T.J. Watson Research Center, 1101 Kitchawan Road, Route 134, Yorktown Heights, NY 10598, USA;IBM T.J. Watson Research Center, 1101 Kitchawan Road, Route 134, Yorktown Heights, NY 10598, USA;IBM T.J. Watson Research Center, 1101 Kitchawan Road, Route 134, Yorktown Heights, NY 10598, USA

  • Venue:
  • Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper evaluates and analyzes multilevel parallelism on a chip multiprocessor (CMP) architecture. The environment is based on the experimental IBM BG/Cyclops architecture, where we have run the multi-zone parallel benchmarks. Multilevel parallelism is spawned using the Nanos OpenMP execution environment. We have performed the analysis with different execution parameters in order to evaluate different hardware threads distributions, cache utilization, and thread grouping configurations. Our results demonstrate that a large number of thread groups and good balancing algorithms are critical for high performance. We also show that a small number of threads can share the same data cache to increase the performance, but a large number of threads should better not share the same data caches.