High speed 3D tomography on CPU, GPU, and FPGA

Authors:
Nicolas Gac;StéPhane Mancini;Michel Desvignes;Dominique Houzet
Affiliations:
Grenoble-Images-Parole-Signal-Automatique Laboratoire, Grenoble Institute of Technology and Equipes Traitement des Images et du Signal, Centre National de la Recherche Scientifique, ENSEA, Univers ...;Grenoble-Images-Parole-Signal-Automatique Laboratoire, Grenoble Institute of Technology, Grenoble Cedex, France;Grenoble-Images-Parole-Signal-Automatique Laboratoire, Grenoble Institute of Technology, Grenoble Cedex, France;Grenoble-Images-Parole-Signal-Automatique Laboratoire, Grenoble Institute of Technology, Grenoble Cedex, France
Venue:
EURASIP Journal on Embedded Systems - Special issue on design and architectures for signal and image processing
Year:
2008

Citing 5
Cited 2

Rapid emission tomography reconstruction

VG '03 Proceedings of the 2003 Eurographics/IEEE TVCG Workshop on Volume graphics
Parallel-Beam Backprojection: An FPGA Implementation Optimized for Medical Imaging

Journal of VLSI Signal Processing Systems
Analysis of Performance Evaluation of Parallel Katsevich Algorithm for 3-D CT Image Reconstruction

IMSCCS '06 Proceedings of the First International Multi-Symposiums on Computer and Computational Sciences - Volume 1 (IMSCCS'06) - Volume 01
A Heterogeneous Windows Cluster System for Medical Image Reconstruction

IMSCCS '06 Proceedings of the First International Multi-Symposiums on Computer and Computational Sciences - Volume 1 (IMSCCS'06) - Volume 01
Hardware/software 2D-3D backprojection on a SoPC platform

Proceedings of the 2006 ACM symposium on Applied computing

Exploration of 3D grid caching strategies for ray-shooting

Journal of Real-Time Image Processing
A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Back-projection (BP) is a costly computational step in tomography image reconstruction such as positron emission tomography (PET). To reduce the computation time, this paper presents a pipelined, prefetch, and parallelized architecture for PET BP (3PA-PET). The key feature of this architecture is its original memory access strategy, masking the high latency of the external memory. Indeed, the pattern of the memory references to the data acquired hinders the processing unit. The memory access bottleneck is overcome by an efficient use of the intrinsic temporal and spatial locality of the BP algorithm. A loop reordering allows an efficient use of general purpose processor's caches, for software implementation, as well as the 3D predictive and adaptive cache (3D-AP cache), when considering hardware implementations. Parallel hardware pipelines are also efficient thanks to a hierarchical 3D-AP cache: each pipeline performs a memory reference in about one clock cycle to reach a computational throughput close to 100%. The 3PA-PET architecture is prototyped on a system on programmable chip (SoPC) to validate the system and to measure its expected performances. Time performances are compared with a desktop PC, a workstation, and a graphic processor unit (GPU).