Segmental multi-way local pooling for video recognition

  • Authors:
  • Ilseo Kim;Sangmin Oh;Arash Vahdat;Kevin Cannons;A.G. Amitha Perera;Greg Mori

  • Affiliations:
  • Kitware Inc., Clifton Park, NY, USA;Kitware Inc., Clifton Park, NY, USA;Simon Fraser University, Burnaby, BC, Canada;Simon Fraser University, Burnaby, BC, Canada;Kitware Inc., Clifton Park, NY, USA;Simon Fraser University, Burnaby, BC, Canada

  • Venue:
  • Proceedings of the 21st ACM international conference on Multimedia
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work, we address the problem of complex event detection on unconstrained videos. We introduce a novel multi-way feature pooling approach which leverages segment-level information. The approach is simple and widely applicable to diverse audio-visual features. Our approach uses a set of clusters discovered via unsupervised clustering of segment-level features. Depending on feature characteristics, not only scene-based clusters but also motion/audio-based clusters can be incorporated. Then, every video is represented with multiple descriptors, where each descriptor is designed to relate to one of the pre-built clusters. For classification, intersection kernel SVMs are used where the kernel is obtained by combining multiple kernels computed from corresponding per-cluster descriptor pairs. Evaluation on TRECVID'11 MED dataset shows a significant improvement by the proposed approach beyond the state-of-the-art.