Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system

Authors:
Linchuan Li;Xingjian Li;Guangming Tan;Mingyu Chen;Peiheng Zhang
Affiliations:
Institute of Computing Technology, Chinese Academy of Science, Beijing, China;Institute of Computing Technology, Chinese Academy of Science, Beijing, China;Institute of Computing Technology, Chinese Academy of Science, Beijing, China;Institute of Computing Technology, Chinese Academy of Science, Beijing, China;Institute of Computing Technology, Chinese Academy of Science, Beijing, China
Venue:
Proceedings of the 20th international symposium on High performance distributed computing
Year:
2011

Citing 6
Cited 2

GPU Cluster for High Performance Computing

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A hybrid condensed finite element model with GPU acceleration for interactive 3D soft tissue cutting: Research Articles

Computer Animation and Virtual Worlds - Special Issue: The Very Best Papers from CASA 2004
Fast scan algorithms on graphics processors

Proceedings of the 22nd annual international conference on Supercomputing
Single-particle 3d reconstruction from cryo-electron microscopy images on GPU

Proceedings of the 23rd international conference on Supercomputing
Axel: a heterogeneous cluster with FPGAs and GPUs

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture

A coarse-grained stream architecture for cryo-electron microscopy images 3D reconstruction

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
VGRIS: virtualized GPU resource isolation and scheduling in cloud gaming

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heterogeneous architecture is becoming an important way to build a massive parallel computer system, i.e. the CPU-GPU heterogeneous systems ranked in Top500 list. However, it is a challenge to efficiently utilize massive parallelism of both applications and architectures on such heterogeneous systems. In this paper we present a practice on how to exploit and orchestrate parallelism at algorithm level to take advantage of underlying parallelism at architecture level. A potential Petaflops application -- cryo-EM 3D reconstruction is selected as an example. We exploit all possible parallelism in cryo-EM 3D reconstruction, and leverage a self-adaptive dynamic scheduling algorithm to create a proper parallelism mapping between the application and architecture. The parallelized programs are evaluated on a subsystem of Dawning Nebulae supercomputer, whose node is composed of two Intel six-core Xeon CPUs and one Nvidia Fermi GPU. The experiment confirms that hierarchical parallelism is an efficient pattern of parallel programming to utilize capabilities of both CPU and GPU in a heterogeneous system. The CUDA kernels run more than 3 times faster than the OpenMP parallelized ones using 12 cores (threads). Based on the GPU-only version, the hybrid CPU-GPU program further improves the whole application's performance by 30% on the average.