Active post-refined multimodality video semantic concept detection with tensor representation

Authors:
Yanan Liu;Fei Wu;Yueting Zhuang;Jun Xiao
Affiliations:
Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China
Venue:
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Year:
2008

Citing 22
Cited 4

Using latent semantic analysis to improve access to textual information

CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A Multilinear Singular Value Decomposition

SIAM Journal on Matrix Analysis and Applications
Support vector machine active learning for image retrieval

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Multilinear Analysis of Image Ensembles: TensorFaces

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Learning query-class dependent weights in automatic video retrieval

Proceedings of the 12th annual ACM international conference on Multimedia
Image clustering with tensor representation

Proceedings of the 13th annual ACM international conference on Multimedia
Learning the semantics of multimedia queries and concepts from a small number of examples

Proceedings of the 13th annual ACM international conference on Multimedia
Putting active learning into multimedia applications: dynamic definition and refinement of concept classifiers

Proceedings of the 13th annual ACM international conference on Multimedia
Text Representation: From Vector to Tensor

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Human Carrying Status in Visual Surveillance

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Exploring temporal consistency for video analysis and retrieval

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Tensor-based techniques for the blind separation of DS-CDMA signals

Signal Processing
Building a comprehensive ontology to refine video concept detection

Proceedings of the international workshop on Workshop on multimedia information retrieval
Supervised tensor learning

Knowledge and Information Systems
Correlative multi-label video annotation

Proceedings of the 15th international conference on Multimedia
Cross-modal correlation learning for clustering on image-audio dataset

Proceedings of the 15th international conference on Multimedia
Video semantic concept detection using multi-modality subspace correlation propagation

MMM'07 Proceedings of the 13th international conference on Multimedia Modeling - Volume Part I
Event based indexing of broadcasted sports video by intermodalcollaboration

IEEE Transactions on Multimedia
Multimedia event-based video indexing using time intervals

IEEE Transactions on Multimedia
Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval

IEEE Transactions on Multimedia

Tensor-based transductive learning for multimodality video semantic concept detection

IEEE Transactions on Multimedia
Mining multi-tag association for image tagging

World Wide Web
Knowledge adaptation for ad hoc multimedia event detection with few exemplars

Proceedings of the 20th ACM international conference on Multimedia
A tensor factorization based least squares support tensor machine for classification

ISNN'13 Proceedings of the 10th international conference on Advances in Neural Networks - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we resolve the problem of multi-modality video representation and semantic concept detection. Interaction and integration of multi-modality media types such as visual, audio and textual data in video are essential to video semantic analysis. Traditionally, videos are represented as vectors in the Euclidean space. Many learning algorithms are then taken to these vectors in a high dimensional space for dimension reduction, classification, clustering and so on. However, the multiple modalities in video not only have their own properties, but also have correlations among them; whereas the simple vector representation weakens the power of these relatively independent modalities and even ignores their relations to some extent. In this paper, we introduce a higher-order tensor framework for video analysis, in which we represent image, video and text three modalities in video shots as data points by the 3rd-order tensor called tensorshots. We propose a novel dimension reduction method that explicitly considers the manifold structure of the tensor space from multimodal media data which is temporal associated co-occurrence and then detect video semantic concepts through powerful classifiers which take tensor as input. Our algorithm preserves the intrinsic structure of the submanifold where tensorshots are sampled, and is also able to map out-of-sample data points directly. Moreover we apply an active learning based contextual and temporal post-refining strategy to enhance detection accuracy. Experiment results show that our method improves the performance of video semantic concept detection.