SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters
Proceedings of the 26th ACM international conference on Supercomputing
SnuCL and an MPI+OpenCL implementation of HPL on heterogeneous CPU/GPU clusters
Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
Automatic OpenCL work-group size selection for multicore CPUs
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Recently, Intel has introduced a research prototype many core processor called the Single-chip Cloud Computer (SCC). The SCC is an experimental processor created by Intel Labs. It contains 48 cores in a single chip and each core has its own L1 and L2 caches without any hardware support for cache coherence. It allows maximum 64GB size of external memory that can be accessed by all cores and each core dynamically maps the external memory into their own address space. In this paper, we introduce the design and implementation of an OpenCL framework (i.e., runtime and compiler) for such many core architectures with no hardware cache coherence. We have found that the OpenCL coherence and consistency model fits well with the SCC architecture. The OpenCL's weak memory consistency model requires relatively small amount of messages and coherence actions to guarantee coherence and consistency between the memory blocks in the SCC. The dynamic memory mapping mechanism enables our framework to preserve the semantics of the buffer object operations in OpenCL with a small overhead. We have implemented the proposed OpenCL runtime and compiler and evaluate their performance on the SCC with OpenCL applications.