Tensor-based transductive learning for multimodality video semantic concept detection

Authors:
Fei Wu;Yanan Liu;Yueting Zhuang
Affiliations:
College of Computer Science and Technology, Zhejiang University, Hangzhou, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, China;College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Venue:
IEEE Transactions on Multimedia
Year:
2009

Citing 17
Cited 3

A Multilinear Singular Value Decomposition

SIAM Journal on Matrix Analysis and Applications
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Semi-Supervised Cross Feature Learning for Semantic Concept Detection in Videos

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Supervised Tensor Learning

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Text Representation: From Vector to Tensor

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Locality preserving projections

Locality preserving projections
1D-PCA, 2D-PCA to nD-PCA

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
Exploring temporal consistency for video analysis and retrieval

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
The challenge problem for automated detection of 101 semantic concepts in multimedia

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Tensor-based techniques for the blind separation of DS-CDMA signals

Signal Processing
Correlative multi-label video annotation

Proceedings of the 15th international conference on Multimedia
Cross-modal correlation learning for clustering on image-audio dataset

Proceedings of the 15th international conference on Multimedia
Optimizing multi-graph learning: towards a unified video annotation scheme

Proceedings of the 15th international conference on Multimedia
Active post-refined multimodality video semantic concept detection with tensor representation

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Tensor embedding methods

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Video semantic concept detection using multi-modality subspace correlation propagation

MMM'07 Proceedings of the 13th international conference on Multimedia Modeling - Volume Part I
Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval

IEEE Transactions on Multimedia

Automatic annotation of weakly-tagged social images on flickr using latent topic discovery of multiple groups

AMC '09 Proceedings of the 2009 workshop on Ambient media computing
Multiple hypergraph clustering of web images by mining Word2Image correlations

Journal of Computer Science and Technology
Theoretical aspects of mapping to multidimensional optimal regions as a multi-classifier

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interaction and integration of multimodality media types such as visual, audio, and textual data in video are the essence of video semantic analysis. Contextual information propagation is useful for both intra- and inter-shot correlations. However, the traditional concatenated vector representation of videos weakens the power of the propagation and compensation among the multiple modalities. In this paper, we introduce a higher-order tensor framework for video analysis. We represent image frame, audio, and text in video shots as data points by the 3rd-order tensor. Then we propose a novel dimension reduction algorithm which explicitly considers the manifold structure of the tensor space from contextual temporal associated cooccurring multimodal media data. Our algorithm inherently preserves the intrinsic structure of the submanifold where tensorshots are sampled and is also able to map out-of-sample data points directly. We propose a new transductive support tensor machines algorithm to train effective classifier using large amount of unlabeled data together with the labeled data. Experiment results on TREVID 2005 data set show that our method improves the performance of video semantic concept detection.