Constrained keypoint quantization: towards better bag-of-words model for large-scale multimedia retrieval

Authors:
Yang Cai;Wei Tong;Linjun Yang;Alexander G. Hauptmann
Affiliations:
Zhejiang University, Hangzhou, China;Carnegie Mellon University, Pittsburgh;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Year:
2012

Citing 19
Cited 1

Modern Information Retrieval

Modern Information Retrieval
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
An efficient parts-based near-duplicate and sub-image retrieval system

Proceedings of the 12th annual ACM international conference on Multimedia
Mercer Kernels for Object Recognition with Local Features

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Creating Efficient Codebooks for Visual Recognition

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
The Pyramid Match Kernel: Efficient Learning with Sets of Features

The Journal of Machine Learning Research
Practical elimination of near-duplicates from web video search

Proceedings of the 15th international conference on Multimedia
An efficient key point quantization algorithm for large scale image retrieval

LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Large-scale near-duplicate web video search: challenge and opportunity

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Improving Bag-of-Features for Large Scale Image Search

International Journal of Computer Vision
Large-scale robust visual codebook construction

Proceedings of the international conference on Multimedia
Video-based image retrieval

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Million-scale near-duplicate video retrieval system

MM '11 Proceedings of the 19th ACM international conference on Multimedia
A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback

IEEE Transactions on Pattern Analysis and Machine Intelligence
Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval

IEEE Transactions on Multimedia
Object Retrieval Using Visual Query Context

IEEE Transactions on Multimedia

Multimodal late fusion bag of features applied to scene detection

Proceedings of the 19th Brazilian symposium on Multimedia and the web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bag-of-words models are among the most widely used and successful representations in multimedia retrieval. However, the quantization error which is introduced when mapping keypoints to visual words is one of the main drawbacks of the bag-of-words model. Although some techniques, such as soft-assignment to bags [23] and query expansion [27], have been introduced to deal with the problem, the performance gain is always at the cost of longer query response time, which makes them difficult to apply to large-scale multimedia retrieval applications. In this paper, we propose a simple "constrained keypoint quantization" method which can effectively reduce the overall quantization error of the bag-of-words representation and greatly improve the retrieval efficiency at the same time. The central idea of the proposed quantization method is that if a keypoint is far away from all visual words, we simply remove it. At first glance, this simple strategy seems naive and dangerous. However, we show that the proposed method has a solid theoretical background. Our experimental results on three widely used datasets for near duplicate image and video retrieval confirm that by removing a large amount of keypoints which have high quantization error, we obtain comparable or even better retrieval performance while dramatically boosting retrieval efficiency.