Visual & textual fusion for region retrieval: from both fuzzy matching and bayesian reasoning aspects

  • Authors:
  • Rongrong Ji;Hongxun Yao

  • Affiliations:
  • Harbin Institute of Technology, Harbin, China;Harbin Institute of Technology, Harbin, China

  • Venue:
  • Proceedings of the international workshop on Workshop on multimedia information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a novel visual & textual information fusion framework for region-based image retrieval. We explore the issue of linguistic-integrated region retrieval from both Bayesian Reasoning and Fuzzy Region Matching aspects. Firstly, to associate textual information with image regions, we present a region-based soft annotation strategy. Our method automatically labels each image region with multiple keywords, each of which is assigned a confidence factor to indicate its annotation accuracy. In annotation classifier training, we adopt a pairwise coupling (PWC) SVM bagging network to address the problems of sample insufficiency and sample asymmetry. Consequently, in image retrieval, we fuse regions. visual & textual information to rank image similarities at perceptual level. Two fusion schemes are explored in proposed framework: 1. Semantic-Supervised Integrated Region Matching (SSIRM); 2. Keyword-Integrated Bayesian Reasoning (KIBR). SSIRM is a keyword-integrated fuzzy region matching strategy, which is adopted in the case that the query image is pre-annotated; KIBR is adopted in the case that the query image is non-annotated or poorly-annotated, which supports both query-by-example and query-by-keyword based on statistical text-image translation model. Finally, in relevance feedback (RF) learning, we exploit a unified visual & textual learning algorithm to precisely capture users' retrieval intention. Superior annotation, retrieval (over IRM) and RF performances (Both over IRM + SVM at region-level and SVM & ALSVM & ABSVM at global-level) are presented in our experiments, which demonstrate the efficiency of proposed fusion framework to bridge the semantic gap.