Content-Based Image Retrieval at the End of the Early Years
IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust content-based image searches for copyright protection
MMDB '03 Proceedings of the 1st ACM international workshop on Multimedia databases
A Performance Evaluation of Local Descriptors
IEEE Transactions on Pattern Analysis and Machine Intelligence
Online video recommendation based on multimodal fusion and relevance feedback
Proceedings of the 6th ACM international conference on Image and video retrieval
FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Parallel Computing Experiences with CUDA
IEEE Micro
Fast and scalable list ranking on the GPU
Proceedings of the 23rd international conference on Supercomputing
Single-particle 3d reconstruction from cryo-electron microscopy images on GPU
Proceedings of the 23rd international conference on Supercomputing
A translation system for enabling data mining applications on GPUs
Proceedings of the 23rd international conference on Supercomputing
Scene classification using pLSA with visterm spatial location
IMCE '09 Proceedings of the 1st international workshop on Interactive multimedia for consumer electronics
Real-time bag of words, approximately
Proceedings of the ACM International Conference on Image and Video Retrieval
Fast tridiagonal solvers on the GPU
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Robust content-based video copy identification in a large reference database
CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Computing parallel speeded-up robust features (P-SURF) via POSIX threads
ICIC'09 Proceedings of the 5th international conference on Emerging intelligent computing technology and applications
An empirically tuned 2D and 3D FFT library on CUDA GPU
Proceedings of the 24th ACM International Conference on Supercomputing
Large-scale FFT on GPU clusters
Proceedings of the 24th ACM International Conference on Supercomputing
Proceedings of the 24th ACM International Conference on Supercomputing
A comprehensive analysis and parallelization of an image retrieval algorithm
ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
SURF: speeded up robust features
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Retina mosaicing using local features
MICCAI'06 Proceedings of the 9th international conference on Medical Image Computing and Computer-Assisted Intervention - Volume Part II
Interleaving and lock-step semantics for analysis and verification of GPU kernels
ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
Hi-index | 0.00 |
With the development of Internet and cloud computing, multimedia data, such as images and videos, has become one of the most common data types being processed. As the scale of multimedia data being still increasing, it is vitally important to efficiently extract useful information from such a huge amount of multimedia data. However, due to the complexity of the core algorithms, multimedia retrieval applications are not only data intensive but also computationally intensive. Therefore, it has been a major challenge to accelerate the processing speed of such applications to satisfy the real-time requirement. As Graphic Processing Unit (GPU) has entered the general-propose computing domain (GPGPU), it has become one of the most popular accelerators for the applications with real-time requirements. In this paper, we parallelize a widely-used image retrieval algorithm called SURF on GPGPU, which is the core algorithm for many video and image retrieval applications. We first analyze the parallelism within SURF to guarantee that there are sufficient tasks being mapped to the large-scale computation resources in GPGPU. We then exploit some inherent GPGPU characteristics, such as 2D memory, to further boost the performance. Finally, we provide some optimization to the cooperation between CPU and GPGPU, which is generally ignored in previous designs. Experimental results show that our parallelization and optimization achieve a throughput of 340.5 frames/s on a NVIDIA GTX295 GPGPU, which is 15X faster than the maximal optimized CPU version. Compared to CUDA SURF, a state-of-the-art parallelization of SURF on GPGPU, our system achieves a speedup by a factor of 2.3X.