Scaling analytics applications with OpenCL for loosely coupled heterogeneous clusters

Authors:
Toshio Suganuma;Rajaram B. Krishnamurthy;Moriyoshi Ohara;Toshio Nakatani
Affiliations:
IBM Research - Tokyo, Toyosu, Koto-ku, Tokyo, Japan;IBM Systems and Technology Group, Poughkeepsie, NY;IBM Research - Tokyo, Toyosu, Koto-ku, Tokyo, Japan;IBM Research - Tokyo, Toyosu, Koto-ku, Tokyo, Japan
Venue:
Proceedings of the ACM International Conference on Computing Frontiers
Year:
2013

Citing 8
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Hybrid OpenCL: Enhancing OpenCL for Distributed Processing

ISPA '11 Proceedings of the 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications
Performance of CUDA Virtualized Remote GPUs in High Performance Clusters

ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Performance characterization of the NAS Parallel Benchmarks in OpenCL

IISWC '11 Proceedings of the 2011 IEEE International Symposium on Workload Characterization
Numerical Analysis for Statisticians

Numerical Analysis for Statisticians
clOpenCL: supporting distributed heterogeneous computing in HPC clusters

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

OpenCL is an open standard for heterogeneous parallel programming, exploiting multi-core CPUs, GPUs, or other accelerators as parallel computing resources. Recent work has extended the OpenCL parallel programming model for distributed heterogeneous clusters. For such loosely coupled acceleration architectures, the design of OpenCL programs to maximize performance is quite different from that of conventional tightly coupled acceleration platforms. This paper describes our experiences in OpenCL programming to extract scalable performance for a distributed heterogeneous cluster environment. We picked two real-world analytics workloads, Two-Step Cluster and Linear Regression, that offer different challenges to efficient OpenCL implementations. We obtained scalable performance with this architecture by carefully managing the amount of data and computations in the kernel program design and by well addressing the network latency problems through optimizations.