Pushing the limits for medical image reconstruction on recent standard multicore processors

Authors:
Jan Treibig;Georg Hager;Hannes G. Hofmann;Joachim Hornegger;Gerhard Wellein
Affiliations:
Erlangen Regional Computing Center, Erlangen, Germany;Erlangen Regional Computing Center, Erlangen, Germany;Pattern Recognition Lab, University Erlangen-Nuremberg, Germany;Pattern Recognition Lab, University Erlangen-Nuremberg, Germany;Erlangen Regional Computing Center, Erlangen, Germany
Venue:
International Journal of High Performance Computing Applications
Year:
2013

Citing 7
Cited 3

Principles of computerized tomographic imaging

Principles of computerized tomographic imaging
3D Reconstruction from Projection Matrices in a C-Arm Based 3D-Angiography System

MICCAI '98 Proceedings of the First International Conference on Medical Image Computing and Computer-Assisted Intervention
Multiple View Geometry in Computer Vision

Multiple View Geometry in Computer Vision
A fast CT reconstruction scheme for a general multi-core PC

Journal of Biomedical Imaging
Interventional 4-D Motion Estimation and Reconstruction of Cardiac Vasculature without Motion Periodicity Assumption

MICCAI '09 Proceedings of the 12th International Conference on Medical Image Computing and Computer-Assisted Intervention: Part I
High-performance cone beam reconstruction using CUDA compatible GPUs

Parallel Computing
Introduction to High Performance Computing for Scientists and Engineers

Introduction to High Performance Computing for Scientists and Engineers

Performance patterns and hardware metrics on modern multicore processors: best practices for performance engineering

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
UWB microwave imaging for breast cancer detection: Many-core, GPU, or FPGA?

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips

Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Volume reconstruction by backprojection is the computational bottleneck in many interventional clinical computed tomography (CT) applications. Today vendors in this field replace special purpose hardware accelerators with standard hardware such as multicore chips and GPGPUs. Medical imaging algorithms are on the verge of employing high-performance computing (HPC) technology, and are therefore an interesting new candidate for optimization. This paper presents low-level optimizations for the backprojection algorithm, guided by a thorough performance analysis on four generations of Intel multicore processors (Harpertown, Westmere, Westmere EX, and Sandy Bridge). We choose the RabbitCT benchmark, a standardized testcase well supported in industry, to ensure transparent and comparable results. Our aim is to provide not only the fastest possible implementation but also compare with performance models and hardware counter data in order to fully understand the results. We separate the influence of algorithmic optimizations, parallelization, SIMD vectorization, and microarchitectural issues and pinpoint problems with current SIMD instruction set extensions on standard CPUs (SSE, AVX). The use of assembly language is mandatory for best performance. Finally, we compare our results to the best GPGPU implementations available for this open competition benchmark.