Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs
Proceedings of the 23rd international conference on Supercomputing
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
Parallel processing with CUDA in ceramic tiles classification
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
High performance predictable histogramming on GPUs: exploring and evaluating algorithm trade-offs
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Heterogeneous computing for vertebra detection and segmentation in x-ray images
Journal of Biomedical Imaging - Special issue on Parallel Computation in Medical Imaging Applications
Image and video processing on CUDA: state of the art and future directions
MACMESE'11 Proceedings of the 13th WSEAS international conference on Mathematical and computational methods in science and engineering
GPU-accelerated MRF segmentation algorithm for SAR images
Computers & Geosciences
Accelerating batch processing of spatial raster analysis using GPU
Computers & Geosciences
Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures
Proceedings of Programming Models and Applications on Multicores and Manycores
Hi-index | 0.00 |
CUDA (Compute Unified Device Architecture) is a novel technology of general-purpose computing on the GPU, which makes users develop general GPU (Graphics Processing Unit) programs easily. This paper analyzes the distinct features of CUDA GPU, summarizes the general program mode of CUDA. Furthermore, we implement several classical image processing algorithms by CUDA, such as histogram equalization, removing clouds, edge detection and DCT encode and decode etc., especially introduce the first two algorithms. If we don’t take the data transfer time in experiment between host memory and device memory into account, as the image size increase, histogram computation can get a more than 40x speedup, removing clouds can get an about 79x speedup, DCT can gain around 8x and edge detection more than 200x.