Using latent semantic analysis to improve access to textual information
CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A Multilinear Singular Value Decomposition
SIAM Journal on Matrix Analysis and Applications
Support vector machine active learning for image retrieval
MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Multilinear Analysis of Image Ensembles: TensorFaces
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Learning query-class dependent weights in automatic video retrieval
Proceedings of the 12th annual ACM international conference on Multimedia
Image clustering with tensor representation
Proceedings of the 13th annual ACM international conference on Multimedia
Learning the semantics of multimedia queries and concepts from a small number of examples
Proceedings of the 13th annual ACM international conference on Multimedia
Proceedings of the 13th annual ACM international conference on Multimedia
Text Representation: From Vector to Tensor
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Human Carrying Status in Visual Surveillance
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Exploring temporal consistency for video analysis and retrieval
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Tensor-based techniques for the blind separation of DS-CDMA signals
Signal Processing
Building a comprehensive ontology to refine video concept detection
Proceedings of the international workshop on Workshop on multimedia information retrieval
Knowledge and Information Systems
Correlative multi-label video annotation
Proceedings of the 15th international conference on Multimedia
Cross-modal correlation learning for clustering on image-audio dataset
Proceedings of the 15th international conference on Multimedia
Video semantic concept detection using multi-modality subspace correlation propagation
MMM'07 Proceedings of the 13th international conference on Multimedia Modeling - Volume Part I
Event based indexing of broadcasted sports video by intermodalcollaboration
IEEE Transactions on Multimedia
Multimedia event-based video indexing using time intervals
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia
Tensor-based transductive learning for multimodality video semantic concept detection
IEEE Transactions on Multimedia
Mining multi-tag association for image tagging
World Wide Web
Knowledge adaptation for ad hoc multimedia event detection with few exemplars
Proceedings of the 20th ACM international conference on Multimedia
A tensor factorization based least squares support tensor machine for classification
ISNN'13 Proceedings of the 10th international conference on Advances in Neural Networks - Volume Part I
Hi-index | 0.00 |
In this paper, we resolve the problem of multi-modality video representation and semantic concept detection. Interaction and integration of multi-modality media types such as visual, audio and textual data in video are essential to video semantic analysis. Traditionally, videos are represented as vectors in the Euclidean space. Many learning algorithms are then taken to these vectors in a high dimensional space for dimension reduction, classification, clustering and so on. However, the multiple modalities in video not only have their own properties, but also have correlations among them; whereas the simple vector representation weakens the power of these relatively independent modalities and even ignores their relations to some extent. In this paper, we introduce a higher-order tensor framework for video analysis, in which we represent image, video and text three modalities in video shots as data points by the 3rd-order tensor called tensorshots. We propose a novel dimension reduction method that explicitly considers the manifold structure of the tensor space from multimodal media data which is temporal associated co-occurrence and then detect video semantic concepts through powerful classifiers which take tensor as input. Our algorithm preserves the intrinsic structure of the submanifold where tensorshots are sampled, and is also able to map out-of-sample data points directly. Moreover we apply an active learning based contextual and temporal post-refining strategy to enhance detection accuracy. Experiment results show that our method improves the performance of video semantic concept detection.