Sequence-based kernels for online concept detection in video

  • Authors:
  • Werner Bailer

  • Affiliations:
  • JOANNEUM RESEARCH, Graz, Austria

  • Venue:
  • AIEMPro '11 Proceedings of the 2011 ACM international workshop on Automated media analysis and production for novel TV services
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Kernel methods, e.g. Support Vector Machines, have been successfully applied to classification problems such as concept detection in video. In order to capture concepts and events with longer temporal extent, kernels for sequences of feature vectors have been proposed, e.g. based on temporal pyramid matching or sequence alignment. However, all these approaches need a temporal segmentation of the video, as the kernel is applied to the feature vectors of a segment. In (semi-)supervised training, this is not a problem, as the ground truth is annotated on a temporal segment. When performing online concept detection on a live video stream, (i) no segmentation exists and (ii) the latency must be kept as low as possible. Re-evaluating the kernel for each temporal position of a sliding window is prohibitive due to the computational effort. We thus propose variants of the temporal pyramid matching, all subsequences and longest common subsequence kernels, which can be efficiently calculated for a temporal sliding window. An arbitrary kernel function can be plugged in to determine the similarity of feature vectors of individual samples. We evaluate the proposed kernels on the TRECVID 2007 High-level Feature Extraction data set and show that the sliding window variants for online detection perform equally well or better than the segment-based ones, while the runtime is reduced by at least 30%.