Visual vocabulary optimization with spatial context for image annotation and classification

Authors:
Zhiguo Yang;Yuxin Peng;Jianguo Xiao
Affiliations:
Institute of Computer Science and Technology, Peking University, Beijing, China;Institute of Computer Science and Technology, Peking University, Beijing, China;Institute of Computer Science and Technology, Peking University, Beijing, China
Venue:
MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Year:
2012

Citing 17
Cited 1

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Communications of the ACM
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Randomized Trees for Real-Time Keypoint Recognition

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A Performance Evaluation of Local Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Estimating average precision with incomplete and imperfect judgments

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Towards optimal bag-of-features for object categorization and semantic video retrieval

Proceedings of the 6th ACM international conference on Image and video retrieval
Kernel Codebooks for Scene Categorization

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Toward a higher-level visual representation for object-based image retrieval

The Visual Computer: International Journal of Computer Graphics
Descriptive visual words and visual phrases for image applications

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Scene categorization via contextual visual words

Pattern Recognition
Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study

IEEE Transactions on Multimedia

A modular image search engine based on key words and color features

Transactions on Edutainment VIII

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a new approach of visual vocabulary optimization with spatial context, which contains important spatial information that has not been fully exploited. The novelty of our method mainly lies in two aspects: when spatial information is considered, and how spatial information is used. For the first aspect, the existing methods generally consider spatial information after the visual vocabulary is built, while we employ the spatial information in the construction of visual vocabulary, to produce more accurate visual vocabulary. For the second aspect, different from existing methods which use spatial information to re-rank the original retrieval results, to generate the local keypoint groups such as visual phrases, or in spatial pyramid matching kernel, etc, we propose a novel method that employs spatial information as side information to constrain the construction of visual vocabulary. Instead of simply assigning keypoints to the nearest cluster centers, we also take the spatial context of keypoints into consideration in the clustering process. With the proposed approach, more accurate visual vocabulary can be generated, and the evaluation results can be improved in both image annotation and classification tasks. Experiments on widely-used 15-scenes dataset demonstrate the effectiveness of the proposed approach.