Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Exploiting Loop-Level Parallelism for SIMD Arrays Using OpenMP
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Toward enhancing OpenMP's work-sharing directives
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Automatic Hybrid MPI+OpenMP Code Generation with llc
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.00 |
High end distributed and distributed shared memory platforms with many thousands of cores will be deployed in the coming years to solve the toughest technical problems. Their individual nodes will be heterogeneous multithreading, multicore systems, capable of executing many threads of control, and with a deep memory hierarchy. For example, the petascale architecture to be put in production at the US National Center for Supercomputing Applications (NCSA) in 2011 is based on the IBM Power7 chip which uses multicore processor technology. Thousands of compute nodes with over 200,000 cores are envisioned. The Roadrunner system that will be deployed at the Los Alamos National Laboratory (LANL) is expected to have heterogneous nodes, with both AMD Opterons and IBM Cells configured, and a similar number of cores.