Multi-layer multi-instance kernel for video concept detection

Authors:
Zhiwei Gu;Tao Mei;Xian-Sheng Hua;Jinhui Tang;Xiuqing Wu
Affiliations:
University of Science and Technology of China, Hefei, China;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China;University of Science and Technology of China, Hefei, China;University of Science and Technology of China, Hefei, China
Venue:
Proceedings of the 15th international conference on Multimedia
Year:
2007

Citing 5
Cited 4

Unsupervised Segmentation of Color-Texture Regions in Images and Video

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-Instance Kernels

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Image Categorization by Learning and Reasoning with Regions

The Journal of Machine Learning Research
MILES: Multiple-Instance Learning via Embedded Instance Selection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm

IEEE Transactions on Multimedia

MILC2: a multi-layer multi-instance learning approach to video concept detection

MMM'08 Proceedings of the 14th international conference on Advances in multimedia modeling
Enhancing multi-lingual information extraction via cross-media inference and fusion

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Semi-supervised multi-instance multi-label learning for video annotation task

Proceedings of the 20th ACM international conference on Multimedia
Marginalized multi-layer multi-instance kernel for video concept detection

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In video concept detection, most existing methods have not well studied the intrinsic hierarchical structure of video content. However, unlike flat attribute-value data used in many existing methods, video is essentially a structured media with multi-layer representation. For example, a video can be represented by a hierarchical structure including, from large to small, shot, key-frame, and region. Moreover, it fits the typical Multi-Instance (MI) setting in which the "bag-instance" correspondence is embedded among contiguous layers. We call such multi-layer structure and the "bag-instance" relation embedded in the structure as Multi-Layer Multi-Instance (MLMI) setting in this paper. We formulate video concept detection as an MLMI learning problem in which a rooted tree with MLMI nature embedded is devised to represent a video segment. Furthermore, by fusing the information from different layers, we construct a novel MLMI kernel to measure the similarities between the instances in the same and different layers. In contrast to traditional MI learning, both the Multi-Layer structure and Multi-Instance relations are leveraged simultaneously in the proposed kernel. We applied MLMI kernel to concept detection task on TRECVID 2005 corpus and reported superior performance (+25% in Mean Average Precision) to standard Support Vector Machine based approaches.