Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context

Authors:
Yuanning Li;Yonghong Tian;Ling-Yu Duan;Jingjing Yang;Tiejun Huang;Wen Gao
Affiliations:
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China;-;-;-;-;-
Venue:
IEEE Transactions on Multimedia
Year:
2010

Citing 0
Cited 3

Tag suggestion and localization for web videos by bipartite graph matching

WSM '11 Proceedings of the 3rd ACM SIGMM international workshop on Social media
Video search and indexing with reinforcement agent for interactive multimedia services

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on embedded systems for interactive multimedia services (ES-IMS)
Marginalized multi-layer multi-instance kernel for video concept detection

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic video annotation is a challenging yet important problem for content-based video indexing and retrieval. In most existing works, annotation is formulated as a multi-labeling problem over individual shots. However, video is by nature informative in spatial and temporal context of semantic concepts. In this paper, we formulate video annotation as a sequence multi-labeling (SML) problem over a shot sequence. Different from many video annotation paradigms working on individual shots, SML aims to predict a multi-label sequence for consecutive shots in a global optimization manner by incorporating spatial and temporal context into a unified learning framework. A novel discriminative method, called sequence multi-label support vector machine (SVMSML), is accordingly proposed to infer the multi-label sequence for a given shot sequence. In SVMSML, a joint kernel is employed to model the feature-level and concept-level context relationships (i.e., the dependencies of concepts on the low-level features, spatial and temporal correlations of concepts). A multiple-kernel learning (MKL) algorithm is developed to optimize the kernel weights of the joint kernel as well as the SML score function. To efficiently search the desirable multi-label sequence over the large output space in both training and test phases, we adopt an approximate method to maximize the energy of a binary Markov random field (BMRF). Extensive experiments on TRECVID'05 and TRECVID'07 datasets have shown that our proposed SVMSML gains superior performance over the state-of-the-art.