Performance Evaluation of the Omni OpenMP Compiler
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
CellSs: making it easier to program the cell broadband engine processor
IBM Journal of Research and Development
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Dma-based prefetching for i/o-intensive workloads on the cell architecture
Proceedings of the 5th conference on Computing frontiers
A Proposal for Task Parallelism in OpenMP
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor
Languages and Compilers for Parallel Computing
International Journal of Parallel Programming
SPENK: adding another level of parallelism on the cell broadband engine
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Evaluation of memory performance on the cell BE with the SARC programming model
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
IEEE Transactions on Parallel and Distributed Systems
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Optimizing the use of static buffers for DMA on a CELL chip
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Runtime address space computation for SDSM systems
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Extending the OpenMP tasking model to allow dependent tasks
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
OpenMP extensions for heterogeneous architectures
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Polyhedral parallel code generation for CUDA
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
As an answer to the forthcoming heterogeneous multicore and accelerator–based architectures, we have proposed some syntactic extensions to C in the form of C pragmas, based on OpenMP, that make easier for programmers to offload parts of their applications to the auxiliary processors. Offloaded tasks can be made more profitable using a simple blocking strategy. And the runtime system is used to better support computation and communication overlap, while moving data to and from accelerators. In order to prove the feasibility and usefulness of our proposal, we have considered the IBM Cell architecture. The performance of the whole system has been evaluated using HPCC STREAM Triad and several NAS benchmarks. We present their evaluation and a detailed performance breakdown at the level of parallel regions. We also classify the parallel regions according to their suitability to be exploited in accelerators. Overall, our performance is better compared to the results obtained from the IBM compiler for the Cell processor.