Bridging the gap between visual and auditory feature spaces for cross-media retrieval

  • Authors:
  • Hong Zhang;Fei Wu

  • Affiliations:
  • The Institute of Artificial Intelligence, Zhejiang University, HangZhou, P.R. China;The Institute of Artificial Intelligence, Zhejiang University, HangZhou, P.R. China

  • Venue:
  • MMM'07 Proceedings of the 13th international conference on Multimedia Modeling - Volume Part I
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cross-media retrieval is an interesting research problem, which seeks to breakthrough the limitation of modality so that users can query multimedia objects by examples of different modalities. In this paper we present a novel approach to learn the underlying correlation between visual and auditory feature spaces for cross-media retrieval. A semi-supervised Correlation Preserving Mapping (SSCPM) is described to learn the isomorphic SSCPM subspace where canonical correlations between original visual and auditory features are furthest preserved. Based on user interactions of relevance feedback, local semantic clusters are formed for images and audios respectively. With the dynamic spread of ranking scores of positive and negative examples, cross-media semantic correlations are refined, and cross-media distance is accurately estimated. Experiment results are encouraging and show that the performance of our approach is effective.