Parallel-beam backprojection: an FPGA implementation optimized for medical imaging
FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Proceedings of the 30th international conference on Software engineering
Optimus: efficient realization of streaming applications on FPGAs
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
PFetch: software prefetching exploiting temporal predictability of memory access streams
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Single-particle 3d reconstruction from cryo-electron microscopy images on GPU
Proceedings of the 23rd international conference on Supercomputing
Axel: a heterogeneous cluster with FPGAs and GPUs
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Experience of parallelizing cryo-EM 3D reconstruction on a CPU-GPU heterogeneous system
Proceedings of the 20th international symposium on High performance distributed computing
Hi-index | 0.00 |
The wide acceptance of bioinformatics, medical imaging and multimedia applications, which have a data-centric favor to them, require more efficient and application-specific systems to be built. Due to the advances in modern FPGA technologies recently, there has been a resurgence in research aimed at accelerator design that leverages FPGAs to accelerate large-scale scientific applications. In this paper, we exploit this trend towards FPGA-based accelerator design and provide a proof-of-concept and comprehensive case study on FPGA-based accelerator design for a single-particle 3D reconstruction application in single-precision floating-point format. The proposed stream architecture is built by first offloading computing-intensive software kernels to dedicated hardware modules, which emphasizes the importance of optimizing computing dominated data access patterns. Then configurable computing streams are constructed by arranging the hardware modules and bypass channels to form a linear deep pipeline. The efficiency of the proposed stream architecture is justified by the reported 2.54 times speedup over a 4-cores CPU. In terms of power efficiency, our FPGA-based accelerator introduces a 7.33 and 3.4 times improvement over a 4-cores CPU and an up-to-date GPU device, respectively.