Sequence-based kernels for online concept detection in video

Authors:
Werner Bailer
Affiliations:
JOANNEUM RESEARCH, Graz, Austria
Venue:
AIEMPro '11 Proceedings of the 2011 ACM international workshop on Automated media analysis and production for novel TV services
Year:
2011

Citing 10
Cited 1

Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Relevance feedback for image retrieval in structured multi-feature spaces

MobiMedia '06 Proceedings of the 2nd international conference on Mobile multimedia communications
Correlative multilabel video annotation with temporal kernels

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
A string matching approach for visual retrieval and classification

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Spatio-temporal pyramid matching for sports videos

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Video event classification using string kernels

Multimedia Tools and Applications
A feature sequence kernel for video concept classification

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part I

AIEMPro 2011: the 4th international workshop on automated media analysis and production for novel TV services

MM '11 Proceedings of the 19th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Kernel methods, e.g. Support Vector Machines, have been successfully applied to classification problems such as concept detection in video. In order to capture concepts and events with longer temporal extent, kernels for sequences of feature vectors have been proposed, e.g. based on temporal pyramid matching or sequence alignment. However, all these approaches need a temporal segmentation of the video, as the kernel is applied to the feature vectors of a segment. In (semi-)supervised training, this is not a problem, as the ground truth is annotated on a temporal segment. When performing online concept detection on a live video stream, (i) no segmentation exists and (ii) the latency must be kept as low as possible. Re-evaluating the kernel for each temporal position of a sliding window is prohibitive due to the computational effort. We thus propose variants of the temporal pyramid matching, all subsequences and longest common subsequence kernels, which can be efficiently calculated for a temporal sliding window. An arbitrary kernel function can be plugged in to determine the similarity of feature vectors of individual samples. We evaluate the proposed kernels on the TRECVID 2007 High-level Feature Extraction data set and show that the sliding window variants for online detection perform equally well or better than the segment-based ones, while the runtime is reduced by at least 30%.