The effectiveness of multiple hardware contexts
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors
ICS '99 Proceedings of the 13th international conference on Supercomputing
α-coral: a multigrain, multithreaded processor architecture
ICS '01 Proceedings of the 15th international conference on Supercomputing
A Library Implementation of the Nano-Threads Programming Model
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Predicate prediction for efficient out-of-order execution
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Performance Study of a Multithreaded Superscalar Microprocessor
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Design and implementation of the POWER5™ microprocessor
Proceedings of the 41st annual Design Automation Conference
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Evaluation of OpenMP for the cyclops multithreaded architecture
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Hi-index | 0.00 |
This paper evaluates and analyzes multilevel parallelism on a chip multiprocessor (CMP) architecture. The environment is based on the experimental IBM BG/Cyclops architecture, where we have run the multi-zone parallel benchmarks. Multilevel parallelism is spawned using the Nanos OpenMP execution environment. We have performed the analysis with different execution parameters in order to evaluate different hardware threads distributions, cache utilization, and thread grouping configurations. Our results demonstrate that a large number of thread groups and good balancing algorithms are critical for high performance. We also show that a small number of threads can share the same data cache to increase the performance, but a large number of threads should better not share the same data caches.