Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying
IEEE Transactions on Pattern Analysis and Machine Intelligence
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Scale & Affine Invariant Interest Point Detectors
International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Efficient Visual Event Detection Using Volumetric Features
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
A Comparison of Affine Region Detectors
International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Discriminative Object Class Models of Appearance and Shape by Correlatons
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Scalable near identical image and shot detection
Proceedings of the 6th ACM international conference on Image and video retrieval
Image retrieval: Ideas, influences, and trends of the new age
ACM Computing Surveys (CSUR)
Speeded-Up Robust Features (SURF)
Computer Vision and Image Understanding
Constructing visual phrases for effective and efficient object-based image retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Video event detection using motion relativity and visual relatedness
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Video retrieval based on object discovery
Computer Vision and Image Understanding
Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval
Computer Vision and Image Understanding
Descriptive visual words and visual phrases for image applications
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Real-time near-duplicate elimination for web video search with content and context
IEEE Transactions on Multimedia - Special issue on integration of context and content
Scale-invariant visual language modeling for object categorization
IEEE Transactions on Multimedia - Special issue on integration of context and content
Visual words based spatiotemporal sequence matching in video copy detection
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Improving Bag-of-Features for Large Scale Image Search
International Journal of Computer Vision
Query by document via a decomposition-based two-level retrieval approach
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
EMD-based video clip retrieval by many-to-many matching
CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
Image retrieval with geometry-preserving visual phrases
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
On the Annotation of Web Videos by Efficient Near-Duplicate Search
IEEE Transactions on Multimedia
A separable low complexity 2D HMM with application to face recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.00 |
Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC\_WEB\_VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.