Contextual image retrieval model

  • Authors:
  • Linjun Yang;Bo Geng;Alan Hanjalic;Xian-Sheng Hua

  • Affiliations:
  • Microsoft Research Asia, Beijing, China;Peking University, Beijing, China;Delft University of Technology, Delft, The Netherlands;Microsoft Research Asia, Beijing, China

  • Venue:
  • Proceedings of the ACM International Conference on Image and Video Retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A state-of-the-art query-by-region image retrieval method typically works as follows. Firstly, the user provides a query image and draws a bounding box to specify the region of interest (ROI). Then the visual words extracted from within the bounding box are used to formulate the query to retrieve images relevant with respect to the ROI. However, if ROI is small and contains only a few visual words, the relevance estimation could be unreliable, which leads to irrelevant results being returned. Since an object in an image seldom occurs in isolation, it often co-occurs with other objects, which can be said to form the search context. Following this paradigm, the visual words in the query image outside the bounding box can be regarded as a context to the ROI, which could be employed to improve the retrieval performance. Motivated by this, we propose in this paper a contextual image retrieval model based on the language modeling approach. We consider the bounding box as an uncertain observation of the latent search intention and the saliency map detected for the query image as a prior. Then a search intention score is inferred per visual word and used to weight the ROI and the context for a better estimation of the query language model. The experimental results on two datasets comprising 5K and 505K images respectively demonstrate the effectiveness of our approach. The proposed contextual image retrieval model achieves 5.5% and 6.9% performance improvements over the standard language modeling approach on the two datasets respectively.