The mathematics of computerized tomography
The mathematics of computerized tomography
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
ACM SIGGRAPH 2003 Papers
Numerical Optimization: Theoretical and Practical Aspects (Universitext)
Numerical Optimization: Theoretical and Practical Aspects (Universitext)
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Algorithm for computer control of a digital plotter
IBM Systems Journal
IEEE Micro
High-performance 3D compressive sensing MRI reconstruction using many-core architectures
Journal of Biomedical Imaging - Special issue on Parallel Computation in Medical Imaging Applications
High-performance sparse matrix-vector multiplication on GPUs for structured grid computations
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
GPU-accelerated preconditioned iterative linear solvers
The Journal of Supercomputing
Hi-index | 0.00 |
As today's standard screening methods frequently fail to detect breast cancer before metastases have developed, early diagnosis is still a major challenge. With the promise of high-quality volume images, three-dimensional ultrasound computer tomography is likely to improve this situation, but has high computational needs. In this work, we investigate the acceleration of the ray-based transmission reconstruction by a GPU-based implementation of the iterative numerical optimization algorithm TVAL3. We identified the regular and transposed sparse-matrix-vector multiply as the performance limiting operations. For accelerated reconstruction we propose two different concepts and devise a hybrid scheme as optimal configuration. In addition we investigate multi-GPU scalability and derive the optimal number of devices for our two primary use-cases: a fast preview mode and a high-resolution mode. In order to achieve a fair estimation of the speedup, we compare our implementation to an optimized CPU version of the algorithm. Using our accelerated implementation we reconstructed a preview 3D volume with 24,576 unknowns, a voxel size of (8 mm)^3 and approximately 200,000 equations in 0.5 s. A high-resolution volume with 1,572,864 unknowns, a voxel size of (2mm)^3 and approximately 1.6 million equations was reconstructed in 23 s. This constitutes an acceleration of over one order of magnitude in comparison to the optimized CPU version.