Nonlinear total variation based noise removal algorithms
Proceedings of the eleventh annual international conference of the Center for Nonlinear Studies on Experimental mathematics : computational issues in nonlinear science: computational issues in nonlinear science
Accelerated volume rendering and tomographic reconstruction using texture mapping hardware
VVS '94 Proceedings of the 1994 symposium on Volume visualization
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Accelerating advanced mri reconstructions on gpus
Proceedings of the 5th conference on Computing frontiers
A performance study of general-purpose applications on graphics processors using CUDA
Journal of Parallel and Distributed Computing
Noise removal using smoothed normals and surface fitting
IEEE Transactions on Image Processing
Hi-index | 0.00 |
The advent of new matrix-valued magnetic resonance imaging modalities such as Diffusion Tensor Imaging (DTI) requires extensive computational acceleration. Computational acceleration on graphics processing units (GPUs) can make the regularization (denoising) of DTI images attractive in clinical settings, hence improving the quality of DTI images in a broad range of applications. Construction of DTI images consists of direction-specific Magnetic Resonance (MR) measurements. Compared with conventional MR, direction-sensitive acquisition has a lower signal-to-noise ratio (SNR). Therefore, high noise levels often limit DTI imaging. Advanced post-processing of imaging data can improve the quality of estimated tensors. However, the post-processing problem is only made more computationally difficult when considering matrix-valued imaging data. This paper describes the acceleration of a Total Variation regularization method for matrix-valued images, in particular, for DTI images on NVIDIA Quadro FX 5600. The TV regularization of a 3-D image with 1283 voxels ultimately achieves 266X speedup and requires 1 minute and 30 seconds on the Quadro, while this algorithm on a dual-core CPU completes in more than 3 hours. In this application study we are aimed at analyzing the effective of excessive synchronization, which provides an insight into generally adapting Variational methods to the GPU architecture for other image processing algorithms designed for matrix-valued images.