Transfer tagging from image to video

Authors:
Yang Yang;Yi Yang;Zi Huang;Heng Tao Shen
Affiliations:
The University of Queensland, Brisbane, Australia;Carnegie Mellon University, Pittsburgh, USA;The University of Queensland, Brisbane, Australia;The University of Queensland, Brisbane, Australia
Venue:
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Year:
2011

Citing 6
Cited 2

Integrating structured biological data by Kernel Maximum Mean Discrepancy

Bioinformatics
Cross-domain video concept detection using adaptive svms

Proceedings of the 15th international conference on Multimedia
The MIR flickr retrieval evaluation

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
NUS-WIDE: a real-world web image database from National University of Singapore

Proceedings of the ACM International Conference on Image and Video Retrieval
Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction

IEEE Transactions on Image Processing
Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval

IEEE Transactions on Multimedia

Local image tagging via graph regularized joint group sparsity

Pattern Recognition
Effective transfer tagging from image to video

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays massive amount of web video datum has been emerging on the Internet. To achieve an effective and efficient video retrieval, it is critical to automatically assign semantic keywords to the videos via content analysis. However, most of the existing video tagging methods suffer from the problem of lacking sufficient tagged training videos due to high labor cost of manual tagging. Inspired by the observation that there are much more well-labeled data in other yet relevant types of media (e.g. images), in this paper we study how to build a "cross-media tunnel" to transfer external tag knowledge from image to video. Meanwhile, the intrinsic data structures of both image and video spaces are well explored for inferring tags. We propose a Cross-Media Tag Transfer (CMTT) paradigm which is able to: 1) transfer tag knowledge between image and video by minimizing their distribution difference; 2) infer tags by revealing the underlying manifold structures embedded within both image and video spaces. We also learn an explicit mapping function to handle unseen videos. Experimental results have been reported and analyzed to illustrate the superiority of our proposal.