Evaluation of OpenMP for the cyclops multithreaded architecture

Authors:
George Almasi;Eduard Ayguadé;Călin Caşcaval;José Castaños;Jesús Labarta;Francisco Martínez;Xavier Martorell;José Moreira
Affiliations:
IBM Thomas J. Watson Research Center, Yorktown Heights, NY;CEPBA, IBM Research Institute, UPC, Barcelona, Spain;IBM Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, Yorktown Heights, NY;CEPBA, IBM Research Institute, UPC, Barcelona, Spain;CEPBA, IBM Research Institute, UPC, Barcelona, Spain;CEPBA, IBM Research Institute, UPC, Barcelona, Spain;IBM Thomas J. Watson Research Center, Yorktown Heights, NY
Venue:
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Year:
2003

Citing 19
Cited 5

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors

ICS '99 Proceedings of the 13th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
α-coral: a multigrain, multithreaded processor architecture

ICS '01 Proceedings of the 15th international conference on Supercomputing
Multi-processor performance on the Tera MTA

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
OpenMP on networks of workstations

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Baring It All to Software: Raw Machines

Computer
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
A Library Implementation of the Nano-Threads Programming Model

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
FlexRAM: Toward an Advanced Intelligent Memory System

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Evaluation of a Multithreaded Architecture for Cellular Computing

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Blue Gene: a vision for protein science using a petaflop supercomputer

IBM Systems Journal - Deep computing for the life sciences
POWER4 system microarchitecture

IBM Journal of Research and Development

Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip

Proceedings of the 3rd conference on Computing frontiers
Exploiting multilevel parallelism using OpenMP on a massive multithreaded architecture

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
An evaluation of OpenMP on current and emerging multithreaded/multicore processors

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multithreaded architectures have the potential of tolerating large memory and functional unit latencies and increase resource utilization. The Blue Gene/Cyclops architecture, being developed at the IBM T. J. Watson Research Center, is one such systems that offers massive intra-chip parallelism. Although the BG/C architecture was initially designed to execute specific applications, we believe that it can be effectively used on a broad range of parallel numerical applications. Programming such applications for this unconventional design requires a significant porting effort when using the basic built-in mechanisms for thread management and synchronization. In this paper, we describe the implementation of an OpenMP environment for parallelizing applications, currently under development at the CEPBA-IBM Research Institute, targeting BG/C. The environment is evaluated with a set of simple numerical kernels and a subset of the NAS OpenMP benchmarks. We identify issues that w ere not initially considered in the design of the BG/C architecture to support a programming model such as OpenMP. We also evaluate features currently offered by the BG/C architecture that should be considered in the implementation of an efficient OpenMP layer for massive intra-chip parallel architectures.