Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Tuning compiler optimizations for simultaneous multithreading
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors
ICS '99 Proceedings of the 13th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
α-coral: a multigrain, multithreaded processor architecture
ICS '01 Proceedings of the 15th international conference on Supercomputing
Multi-processor performance on the Tera MTA
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
OpenMP on networks of workstations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A Library Implementation of the Nano-Threads Programming Model
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
FlexRAM: Toward an Advanced Intelligent Memory System
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Evaluation of a Multithreaded Architecture for Cellular Computing
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
POWER4 system microarchitecture
IBM Journal of Research and Development
Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip
Proceedings of the 3rd conference on Computing frontiers
Exploiting multilevel parallelism using OpenMP on a massive multithreaded architecture
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
An evaluation of OpenMP on current and emerging multithreaded/multicore processors
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Hi-index | 0.00 |
Multithreaded architectures have the potential of tolerating large memory and functional unit latencies and increase resource utilization. The Blue Gene/Cyclops architecture, being developed at the IBM T. J. Watson Research Center, is one such systems that offers massive intra-chip parallelism. Although the BG/C architecture was initially designed to execute specific applications, we believe that it can be effectively used on a broad range of parallel numerical applications. Programming such applications for this unconventional design requires a significant porting effort when using the basic built-in mechanisms for thread management and synchronization. In this paper, we describe the implementation of an OpenMP environment for parallelizing applications, currently under development at the CEPBA-IBM Research Institute, targeting BG/C. The environment is evaluated with a set of simple numerical kernels and a subset of the NAS OpenMP benchmarks. We identify issues that w ere not initially considered in the design of the BG/C architecture to support a programming model such as OpenMP. We also evaluate features currently offered by the BG/C architecture that should be considered in the implementation of an efficient OpenMP layer for massive intra-chip parallel architectures.