Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Authors:
Lei Wang;Dawei Song;Eyad Elyan
Affiliations:
Robert Gordon University, Aberdeen, United Kingdom;The Open University, Milton Keynes, United Kingdom;Robert Gordon University, Aberdeen, United Kingdom
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 28
Cited 0

Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying

IEEE Transactions on Pattern Analysis and Machine Intelligence
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Efficient Visual Event Detection Using Volumetric Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
A Comparison of Affine Region Detectors

International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Discriminative Object Class Models of Appearance and Shape by Correlatons

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Scalable near identical image and shot detection

Proceedings of the 6th ACM international conference on Image and video retrieval
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Speeded-Up Robust Features (SURF)

Computer Vision and Image Understanding
Constructing visual phrases for effective and efficient object-based image retrieval

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Video event detection using motion relativity and visual relatedness

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Video retrieval based on object discovery

Computer Vision and Image Understanding
Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

Computer Vision and Image Understanding
Descriptive visual words and visual phrases for image applications

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Real-time near-duplicate elimination for web video search with content and context

IEEE Transactions on Multimedia - Special issue on integration of context and content
Scale-invariant visual language modeling for object categorization

IEEE Transactions on Multimedia - Special issue on integration of context and content
Visual words based spatiotemporal sequence matching in video copy detection

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Improving Bag-of-Features for Large Scale Image Search

International Journal of Computer Vision
Query by document via a decomposition-based two-level retrieval approach

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
EMD-based video clip retrieval by many-to-many matching

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
Image retrieval with geometry-preserving visual phrases

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
On the Annotation of Web Videos by Efficient Near-Duplicate Search

IEEE Transactions on Multimedia
A separable low complexity 2D HMM with application to face recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC\_WEB\_VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.