Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Evaluating bag-of-visual-words representations in scene classification
Proceedings of the international workshop on Workshop on multimedia information retrieval
Local invariant feature detectors: a survey
Foundations and Trends® in Computer Graphics and Vision
Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Improving Bag-of-Features for Large Scale Image Search
International Journal of Computer Vision
Adding Affine Invariant Geometric Constraint for Partial-Duplicate Image Retrieval
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Scalable triangulation-based logo recognition
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
City-scale landmark identification on mobile devices
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Hi-index | 0.00 |
Recently, feature grouping has been proposed as a method for improving retrieval results for logos and web images. This relies on the idea that a group of features matching over a local region in an image is more discriminative than a single feature match. In this paper, we evolve this concept further and apply it to the more challenging task of landmark recognition. We propose a novel combination of dense sampling of SIFT features with interest regions which represent the more salient parts of the image in greater detail. In place of conventional dense sampling used in category recognition that computes features on a regular grid at a number of fixed scales, we allow the sampling density and scale to vary based on the scale of the interest region. We develop new techniques for exploring stronger geometric constraints inside the feature groups and computing the match score. The spatial information is stored efficiently in an inverted index structure. The proposed approach considers part-based matching of interest regions instead of matching entire images using a histogram under bag-of-words. This helps reducing the influence of background clutter and works better under occlusion. Experiments reveal that directing more attention to the salient regions of the image and applying proposed geometric constraints helps in vastly improving recognition rates for reasonable vocabulary sizes.