Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Creating audio keywords for event detection in soccer video
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Robust voting algorithm based on labels of behavior for video copy detection
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Z-grid-based probabilistic retrieval for scaling up content-based copy detection
Proceedings of the 6th ACM international conference on Image and video retrieval
Video copy detection: a comparative study
Proceedings of the 6th ACM international conference on Image and video retrieval
Speeded-Up Robust Features (SURF)
Computer Vision and Image Understanding
Content-Based Audio Retrieval Using Perceptual Hash
IIH-MSP '08 Proceedings of the 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing
DCT based multiple hashing technique for robust audio fingerprinting
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Scale-rotation invariant pattern entropy for keypoint-based near-duplicate detection
IEEE Transactions on Image Processing
Coherent phrase model for efficient image near-duplicate retrieval
IEEE Transactions on Multimedia
Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval
IEEE Transactions on Multimedia
Content redundancy in YouTube and its application to video tagging
ACM Transactions on Information Systems (TOIS)
Activity recognition using a spectral entropy signature
Proceedings of the 2012 ACM Conference on Ubiquitous Computing
Listen, look, and gotcha: instant video search with mobile phones by layered audio-video indexing
Proceedings of the 21st ACM international conference on Multimedia
Multimodal late fusion bag of features applied to scene detection
Proceedings of the 19th Brazilian symposium on Multimedia and the web
Hi-index | 0.00 |
Current content-based video copy detection approaches mostly concentrate on the visual cues and neglect the audio information. In this paper, we attempt to tackle the video copy detection task resorting to audio information, which is equivalently important as well as visual information in multimedia processing. Firstly, inspired by bag-of visual words model, a bag-of audio words (BoA) representation is proposed to characterize each audio frame. Different from naive single-based modeling audio retrieval approaches, BoA is a high-level model due to its perceptual and semantical property. Within the BoA model, a coherency vocabulary indexing structure is adopted to achieve more efficient and effective indexing than single vocabulary of standard BoW model. The coherency vocabulary takes advantage of multiple audio features by computing co-occurrence of them across different feature spaces. By enforcing the tight coherency constraint across feature spaces, coherency vocabulary makes the BoA model more discriminative and robust to various audio transforms. 2D Hough transform is then applied to aggregate scores from matched audio segments. The segements fall into the peak bin is identified as the copy segments in reference video. In addition, we also accomplish video copy detection from both audio and visual cues by performing four late fusion strategies to demonstrate complementarity of audio and visual information in video copy detection. Intensive experiments are conducted on the large-scale dataset of TRECVID 2009 and competitve results are achieved.