Design of a separable transition-diagram compiler
Communications of the ACM
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
The Omni OpenMP Compiler on the Distributed Shared Memory of Cenju-4
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics
The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics
Dynamic Load Balancing of MPI+OpenMP Applications
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
Live migration of virtual machines
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Dynamic Load Balancing on Dedicated Heterogeneous Systems
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Towards OpenMP Execution on Software Distributed Shared Memory Systems
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Dynamic Load Balancing Algorithm for MPI Parallel Computing
NISS '09 Proceedings of the 2009 International Conference on New Trends in Information and Service Science
A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA
ICISE '09 Proceedings of the 2009 First IEEE International Conference on Information Science and Engineering
High-Performance Cloud Computing: A View of Scientific Applications
ISPAN '09 Proceedings of the 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks
Proceedings of the 3rd International Workshop on Multicore Software Engineering
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
hiCUDA: High-Level GPGPU Programming
IEEE Transactions on Parallel and Distributed Systems
Source-to-Source Code Translator: OpenMP C to CUDA
HPCC '11 Proceedings of the 2011 IEEE International Conference on High Performance Computing and Communications
An OpenMP Compiler for Hybrid CPU/GPU Computing Architecture
INCOS '11 Proceedings of the 2011 Third International Conference on Intelligent Networking and Collaborative Systems
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
Computing in Science and Engineering
A CUDA programming toolkit on grids
International Journal of Grid and Utility Computing
Enabling Mixed OpenMP/MPI Programming on Hybrid CPU/GPU Computing Architecture
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Task Scheduling for GPU Heterogeneous Cluster
CLUSTERW '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops
Hi-index | 0.00 |
In this paper, we propose a program development toolkit called OMPICUDA for hybrid CPU/GPU clusters. With the support of this toolkit, users can make use of a familiar programming model, i.e., compound OpenMP and MPI instead of mixed CUDA and MPI or SDSM to develop their applications on a hybrid CPU/GPU cluster. In addition, they can adapt the types of resources used for executing different parallel regions in the same program by means of an extended device directive according to the property of each parallel region. On the other hand, this programming toolkit supports a set of data-partition interfaces for users to achieve load balance at the application level no matter what type of resources are used for the execution of their programs.