Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Relevance feedback for image retrieval in structured multi-feature spaces
MobiMedia '06 Proceedings of the 2nd international conference on Mobile multimedia communications
Correlative multilabel video annotation with temporal kernels
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
A string matching approach for visual retrieval and classification
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Spatio-temporal pyramid matching for sports videos
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Video event classification using string kernels
Multimedia Tools and Applications
A feature sequence kernel for video concept classification
MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part I
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Hi-index | 0.00 |
Kernel methods, e.g. Support Vector Machines, have been successfully applied to classification problems such as concept detection in video. In order to capture concepts and events with longer temporal extent, kernels for sequences of feature vectors have been proposed, e.g. based on temporal pyramid matching or sequence alignment. However, all these approaches need a temporal segmentation of the video, as the kernel is applied to the feature vectors of a segment. In (semi-)supervised training, this is not a problem, as the ground truth is annotated on a temporal segment. When performing online concept detection on a live video stream, (i) no segmentation exists and (ii) the latency must be kept as low as possible. Re-evaluating the kernel for each temporal position of a sliding window is prohibitive due to the computational effort. We thus propose variants of the temporal pyramid matching, all subsequences and longest common subsequence kernels, which can be efficiently calculated for a temporal sliding window. An arbitrary kernel function can be plugged in to determine the similarity of feature vectors of individual samples. We evaluate the proposed kernels on the TRECVID 2007 High-level Feature Extraction data set and show that the sliding window variants for online detection perform equally well or better than the segment-based ones, while the runtime is reduced by at least 30%.