Contextual image retrieval model

Authors:
Linjun Yang;Bo Geng;Alan Hanjalic;Xian-Sheng Hua
Affiliations:
Microsoft Research Asia, Beijing, China;Peking University, Beijing, China;Delft University of Technology, Delft, The Netherlands;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the ACM International Conference on Image and Video Retrieval
Year:
2010

Citing 19
Cited 2

Modeling visual attention via selective tuning

Artificial Intelligence - Special volume on computer vision
A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
An effective region-based image retrieval framework

Proceedings of the tenth ACM international conference on Multimedia
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Contrast-based image attention analysis by using fuzzy growing

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Scale & Affine Invariant Interest Point Detectors

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Incorporating real-valued multiple instance learning into relevance feedback for image retrieval

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Combining stroke-based and selection-based relevance feedback for content-based image retrieval

Proceedings of the 15th international conference on Multimedia
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Introduction to Information Retrieval

Introduction to Information Retrieval
Statistical Language Models for Information Retrieval

Statistical Language Models for Information Retrieval
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
Multimedia content analysis and search: new perspectives and approaches

MM '09 Proceedings of the 17th ACM international conference on Multimedia
A Study of Language Model for Image Retrieval

ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops

A unified context model for web image retrieval

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Exploiting visual word co-occurrence for image retrieval

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

A state-of-the-art query-by-region image retrieval method typically works as follows. Firstly, the user provides a query image and draws a bounding box to specify the region of interest (ROI). Then the visual words extracted from within the bounding box are used to formulate the query to retrieve images relevant with respect to the ROI. However, if ROI is small and contains only a few visual words, the relevance estimation could be unreliable, which leads to irrelevant results being returned. Since an object in an image seldom occurs in isolation, it often co-occurs with other objects, which can be said to form the search context. Following this paradigm, the visual words in the query image outside the bounding box can be regarded as a context to the ROI, which could be employed to improve the retrieval performance. Motivated by this, we propose in this paper a contextual image retrieval model based on the language modeling approach. We consider the bounding box as an uncertain observation of the latent search intention and the saliency map detected for the query image as a prior. Then a search intention score is inferred per visual word and used to weight the ROI and the context for a better estimation of the query language model. The experimental results on two datasets comprising 5K and 505K images respectively demonstrate the effectiveness of our approach. The proposed contextual image retrieval model achieves 5.5% and 6.9% performance improvements over the standard language modeling approach on the two datasets respectively.