A coarse-grained stream architecture for cryo-electron microscopy images 3D reconstruction

  • Authors:
  • Wendi Wang;Bo Duan;Wen Tang;Chunming Zhang;Guangming Tang;Peiheng Zhang;Ninghui Sun

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

  • Venue:
  • Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The wide acceptance of bioinformatics, medical imaging and multimedia applications, which have a data-centric favor to them, require more efficient and application-specific systems to be built. Due to the advances in modern FPGA technologies recently, there has been a resurgence in research aimed at accelerator design that leverages FPGAs to accelerate large-scale scientific applications. In this paper, we exploit this trend towards FPGA-based accelerator design and provide a proof-of-concept and comprehensive case study on FPGA-based accelerator design for a single-particle 3D reconstruction application in single-precision floating-point format. The proposed stream architecture is built by first offloading computing-intensive software kernels to dedicated hardware modules, which emphasizes the importance of optimizing computing dominated data access patterns. Then configurable computing streams are constructed by arranging the hardware modules and bypass channels to form a linear deep pipeline. The efficiency of the proposed stream architecture is justified by the reported 2.54 times speedup over a 4-cores CPU. In terms of power efficiency, our FPGA-based accelerator introduces a 7.33 and 3.4 times improvement over a 4-cores CPU and an up-to-date GPU device, respectively.