Visual code-sentences: a new video representation based on image descriptor sequences

  • Authors:
  • Yusuke Mitarai;Masakazu Matsugu

  • Affiliations:
  • Canon Inc. Digital System Technology Development Headquarters, Tokyo, Japan;Canon Inc. Digital System Technology Development Headquarters, Tokyo, Japan

  • Venue:
  • ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new descriptor-sequence model for action recognition that enhances discriminative power in the spatio-temporal context, while maintaining robustness against background clutter as well as variability in inter-/intra-person behavior. We extend the framework of Dense Trajectories based activity recognition (Wang et al., 2011) and introduce a pool of dynamic Bayesian networks (e.g., multiple HMMs) with histogram descriptors as codebooks of composite action categories represented at respective key points. The entire codebooks bound with spatio-temporal interest points constitute intermediate feature representation as basis for generic action categories. This representation scheme is intended to serve as visual code-sentences which subsume a rich vocabulary of basis action categories. Through extensive experiments using KTH, UCF Sports, and Hollywood2 datasets, we demonstrate some improvements over the state-of-the-art methods.