Performance Evaluation of the Omni OpenMP Compiler
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip
Proceedings of the 3rd conference on Computing frontiers
Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture
HPCS '06 Proceedings of the 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Orchestrating data transfer for the cell/B.E. processor
Proceedings of the 22nd annual international conference on Supercomputing
Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
TL-DAE: thread-level decoupled access/execution for OpenMP on the cyclops-64 many-core processor
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
An application-centric evaluation of OpenCL on multi-core CPUs
Parallel Computing
Hi-index | 0.00 |
Programming a multicore processor is difficult. It is even more difficult if the processor has software-managed memory hierarchy, e.g. the IBM Cyclops-64 (C64). A widely accepted parallel programming solution for multicore processor is OpenMP. Currently, all OpenMP directives are only used to decompose computation code (such as loop iterations, tasks, code sections, etc.). None of them can be used to control data movement, which is crucial for the C64 performance. In this paper, we propose a technique called tile percolation . This method provides the programmer with a set of OpenMP pragma directives. The programmer can use these directives to annotate their program to specify where and how to perform data movement. The compiler will then generate the required code accordingly. Our method is a semi-automatic code generation approach intended to simplify a programmer's work. The paper provides (a) an exploration of the possibility of developing pragma directives for semi-automatic data movement code generation in OpenMP; (b) an introduction of techniques used to implement tile percolation including the programming API, the code generation in compiler, and the required runtime support routines; (c) and an evaluation of tile percolation with a set of benchmarks. Our experimental results show that tile percolation can make the OpenMP programs run on the C64 chip more efficiently.