Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Optimizing the use of static buffers for DMA on a CELL chip
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
COMIC: a coherent shared memory interface for cell be
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluation of memory performance on the cell BE with the SARC programming model
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Hierarchical Task-Based Programming With StarSs
International Journal of High Performance Computing Applications
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Achieving high memory performance from heterogeneous architectures with the SARC programming model
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
State-of-the-art in heterogeneous computing
Scientific Programming
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
An efficient software radio framework for WiMAX physical layer on cell multicore platform
ICC'09 Proceedings of the 2009 IEEE international conference on Communications
The reverse-acceleration model for programming petascale hybrid systems
IBM Journal of Research and Development
Exploring programming model-driven QoS support for NoC-based platforms
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Compiler-directed memory management for heterogeneous MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
Cost-aware function migration in heterogeneous systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Programming heterogeneous multicore systems using threading building blocks
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Automatic data distribution for improving data locality on the cell BE architecture
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Analysis of task offloading for accelerators
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Development process for clusters on a reconfigurable chip
Computers and Electrical Engineering
A Simple Compressive Sensing Algorithm for Parallel Many-Core Architectures
Journal of Signal Processing Systems
libEOMP: a portable OpenMP runtime library based on MCA APIs for embedded systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Exploring heterogeneous scheduling using the task-centric programming model
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Portable mapping of openMP to multicore embedded systems using MCA APIs
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q
IBM Journal of Research and Development
Simple, portable and fast SIMD intrinsic programming: generic simd library
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Hi-index | 0.00 |
The Cell processor is a heterogeneous multi-core processor with one power processing engine (PPE) core and eight synergistic processing engine (SPE) cores. There is a significant amount of ongoing research in programming models and tools that attempts to make it easy to exploit the computation power of the Cell architecture. In our work, we explore supporting OpenMP on the Cell processor. It is attractive to support OpenMP because programmers can continue using their familiar programming model, and existing code can be re-used. We base our work on IBM's XL compiler, and developed new components in the XL compiler and a new runtime library. Three major issues are addressed: (1) synchronization support on heterogeneous cores; (2) code generation targeting the different instruction sets; (3) data transfers and implement the OpenMP memory model. We present experimental results for some SPEC OMP 2001 and NAS benchmarks to demonstrate the effectiveness of this approach. A visualization tool based on Paraver is also used to provide some insights into actual thread and synchronization behaviors.