Cross-media manifold learning for image retrieval & annotation

  • Authors:
  • Xianming Liu;Rongrong Ji;Hongxun Yao;Pengfei Xu;Xiaoshuai Sun;Tianqiang Liu

  • Affiliations:
  • Harbin Institute of Technology, Harbin, China;Harbin Institute of Technology, Harbin, China;Harbin Institute of Technology, Harbin, China;Harbin Institute of Technology, Harbin, China;Harbin Institute of Technology, Harbin, China;Harbin Institute of Technology, Harbin, China

  • Venue:
  • MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fusion of visual content with textual information is an effective way for both content-based and keyword-based image retrieval. However, the performance of visual & textual fusion is affected greatly by the data noise and redundancy in both text (such as surrounding text in HTML pages) and visual (such as intra-class diversity) aspects. This paper presents a manifold-based cross-media optimization scheme to achieve visual & textual fusion within a unified framework. Cross-Media manifold co-training mechanism between Keyword-based Metric Space and Vision-Based Metric Space is proposed creatively to infer a best dual-space fusion by minimizing manifold-based visual & textual energy criterion. We present the Isomorphic Manifold Learning to map the annotation affection in image visual space onto keyword semantic space by manifold shrinkage. We also demonstrate its correctness and convergence from mathematical perspective. The retrieval can be performed using both keyword or sample images respectively on Keyword-Based Metric Space and Vision-Based Metric Space, while the simple distance classifiers will satisfy. Two groups of experiments are conducted: The first group is carried on Corel 5000 image database to validate our effectiveness by comparing with state-of-the-art Generalized Manifold Ranking Based Image Retrieval and SVM. The second group is done over real-world Flickr dataset with over 6,000 images to testify our effectiveness in real-world application. The promising results show that our model attains a significant improvement over state-of-the-art algorithms.