Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

Authors:
Yu-Gang Jiang;Chong-Wah Ngo
Affiliations:
Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Hong Kong;Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Hong Kong
Venue:
Computer Vision and Image Understanding
Year:
2009

Citing 27
Cited 13

The nature of statistical learning theory

The nature of statistical learning theory
Texture Features for Browsing and Retrieval of Image Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A vector space model for automatic indexing

Communications of the ACM
The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
Matching words and pictures

The Journal of Machine Learning Research
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
An efficient parts-based near-duplicate and sub-image retrieval system

Proceedings of the 12th annual ACM international conference on Multimedia
Detecting image near-duplicate by stochastic attributed relational graph matching with learning

Proceedings of the 12th annual ACM international conference on Multimedia
A Performance Evaluation of Local Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
Creating Efficient Codebooks for Visual Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
A Comparison of Affine Region Detectors

International Journal of Computer Vision
A statistical method for system evaluation using incomplete judgments

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Diffusion Distance for Histogram Comparison

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Near-duplicate keyframe retrieval with visual keywords and semantic context

Proceedings of the 6th ACM international conference on Image and video retrieval
Towards optimal bag-of-features for object categorization and semantic video retrieval

Proceedings of the 6th ACM international conference on Image and video retrieval
Bag-of-visual-words expansion using visual relatedness for video indexing

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Hyperfeatures – multilevel local coding for visual recognition

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
Sampling strategies for bag-of-features image classification

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV

Scalable detection of partial near-duplicate videos by visual-temporal consistency

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Large-scale near-duplicate web video search: challenge and opportunity

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Distances and weighting schemes for bag of visual words image retrieval

Proceedings of the international conference on Multimedia information retrieval
A visual word weighting scheme based on emerging itemsets for video annotation

Information Processing Letters
Max-margin dictionary learning for multiclass image categorization

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
A BOVW based query generative model

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part I
Visual content representation using semantically similar visual words

Expert Systems with Applications: An International Journal
A visual approach for video geocoding using bag-of-scenes

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management
Topic based pose relevance learning in dance archives

Proceedings of the 21st ACM international conference on Information and knowledge management
Sparse representation and learning in visual recognition: Theory and applications

Signal Processing
Near-duplicate video retrieval: Current research and future trends

ACM Computing Surveys (CSUR)
Histogram of visual words based on locally adaptive regression kernels descriptors for image feature extraction

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic and ontological aspects of visual words for video analysis. Two approaches, soft-weighting and constraint-based earth mover's distance (CEMD), are proposed to model different aspects of visual word linguistics and proximity. In soft-weighting, visual words are cleverly weighted such that the linguistic meaning of words is taken into account for bin-to-bin histogram comparison. In CEMD, a cross-bin matching algorithm is formulated such that the ground distance measure considers the linguistic similarity of words. In particular, a BoW ontology which hierarchically specifies the hyponym relationship of words is constructed to assist the reasoning. We demonstrate soft-weighting and CEMD on two tasks: video semantic indexing and near-duplicate keyframe retrieval. Experimental results indicate that soft-weighting is superior to other popular weighting schemes such as term frequency (TF) weighting in large-scale video database. In addition, CEMD shows excellent performance compared to cosine similarity in near-duplicate retrieval.