Bridging the gap between visual and auditory feature spaces for cross-media retrieval

Authors:
Hong Zhang;Fei Wu
Affiliations:
The Institute of Artificial Intelligence, Zhejiang University, HangZhou, P.R. China;The Institute of Artificial Intelligence, Zhejiang University, HangZhou, P.R. China
Venue:
MMM'07 Proceedings of the 13th international conference on Multimedia Modeling - Volume Part I
Year:
2007

Citing 8
Cited 1

Audio Retrieval with Fast Relevance Feedback Based on Constrained Fuzzy Clustering and Stored Index Table

PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Learning an image manifold for retrieval

Proceedings of the 12th annual ACM international conference on Multimedia
Multi-model similarity propagation and its application for web image retrieval

Proceedings of the 12th annual ACM international conference on Multimedia
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
Measuring multi-modality similarities via subspace learning for cross-media retrieval

PCM'06 Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing
ClassView: hierarchical video shot classification, indexing, and accessing

IEEE Transactions on Multimedia
CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines

IEEE Transactions on Circuits and Systems for Video Technology
Content-based audio classification and retrieval by support vector machines

IEEE Transactions on Neural Networks

Cross-modal correlation learning for clustering on image-audio dataset

Proceedings of the 15th international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cross-media retrieval is an interesting research problem, which seeks to breakthrough the limitation of modality so that users can query multimedia objects by examples of different modalities. In this paper we present a novel approach to learn the underlying correlation between visual and auditory feature spaces for cross-media retrieval. A semi-supervised Correlation Preserving Mapping (SSCPM) is described to learn the isomorphic SSCPM subspace where canonical correlations between original visual and auditory features are furthest preserved. Based on user interactions of relevance feedback, local semantic clusters are formed for images and audios respectively. With the dynamic spread of ranking scores of positive and negative examples, cross-media semantic correlations are refined, and cross-media distance is accurately estimated. Experiment results are encouraging and show that the performance of our approach is effective.