LibWater: heterogeneous distributed computing made easy

Authors:
Ivan Grasso;Simone Pellegrini;Biagio Cosenza;Thomas Fahringer
Affiliations:
University of Innsbruck, Innsbruck, Austria;University of Innsbruck, Innsbruck, Austria;University of Innsbruck, Innsbruck, Austria;University of Innsbruck, Innsbruck, Austria
Venue:
Proceedings of the 27th international ACM conference on International conference on supercomputing
Year:
2013

Citing 16
Cited 1

Efficient algorithms for all-to-all communications in multi-port message-passing systems

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Performance analysis of MPI collective operations

Cluster Computing
The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

Proceedings of the 22nd annual international conference on Supercomputing
Two-tree algorithms for full bandwidth broadcast, reduction and scan

Parallel Computing
A static task partitioning approach for heterogeneous systems using OpenCL

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Hybrid OpenCL: Enhancing OpenCL for Distributed Processing

ISPA '11 Proceedings of the 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications
Enabling task-level scheduling on heterogeneous platforms

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters

Proceedings of the 26th ACM international conference on Supercomputing
Generalizing the Utility of GPUs in Large-Scale Heterogeneous Computing Systems

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Productive Programming of GPU Clusters with OmpSs

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Runtime detection and optimization of collective communication patterns

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
OpenCL Remote: Extending OpenCL Platform Model to Network Scale

HPCC '12 Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems
CUDASA: compute unified device and systems architecture

EG PGV'08 Proceedings of the 8th Eurographics conference on Parallel Graphics and Visualization
Automatic problem size sensitive task partitioning on heterogeneous parallel systems

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
An automatic input-sensitive approach for heterogeneous task partitioning

Proceedings of the 27th international ACM conference on International conference on supercomputing

An automatic input-sensitive approach for heterogeneous task partitioning

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clusters of heterogeneous nodes composed of multi-core CPUs and GPUs are increasingly being used for High Performance Computing (HPC) due to the benefits in peak performance and energy efficiency. In order to fully harvest the computational capabilities of such architectures, application developers often employ a combination of different parallel programming paradigms (e.g. OpenCL, CUDA, MPI and OpenMP), also known in literature as hybrid programming, which makes application development very challenging. Furthermore, these languages offer limited support to orchestrate data and computations for heterogeneous systems. In this paper, we present libWater, a uniform approach for programming distributed heterogeneous computing systems. It consists of a simple interface, compliant with the OpenCL programming model, and a runtime system which extends the capabilities of OpenCL beyond single platforms and single compute nodes. libWater enhances the OpenCL event system by enabling inter-context and inter-node device synchronization. Furthermore, libWater's runtime system uses dependency information enforced by event synchronization to dynamically build a DAG of enqueued commands which enables a class of advanced runtime optimizations. The detection and optimization of collective communication patterns is an example which, as shown by experimental results, improves the efficiency of the libWater runtime system for several application codes.