Exploring probabilistic localized video representation for human action recognition

  • Authors:
  • Yan Song;Sheng Tang;Yan-Tao Zheng;Tat-Seng Chua;Yongdong Zhang;Shouxun Lin

  • Affiliations:
  • Laboratory of Advanced Computing Research, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 10090 and Graduate University of the Chinese Academy of Sciences, Beijing, ...;Laboratory of Advanced Computing Research, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 10090;Institute for Infocomm Research, A*STAR, Singapore, Singapore;School of Computing, National University of Singapore, Singapore, Singapore;Laboratory of Advanced Computing Research, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 10090;Laboratory of Advanced Computing Research, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 10090

  • Venue:
  • Multimedia Tools and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, the bag-of-words (BoW) video representations have achieved promising results in human action recognition in videos. By vector quantizing local spatial temporal (ST) features, the BoW video representation brings in simplicity and efficiency, but limitations too. First, the discretization of feature space in BoW inevitably results in ambiguity and information loss in video representation. Second, there exists no universal codebook for BoW representation. The codebook needs to be re-built when video corpus is changed. To tackle these issues, this paper explores a localized, continuous and probabilistic video representation. Specifically, the proposed representation encodes the visual and motion information of an ensemble of local ST features of a video into a distribution estimated by a generative probabilistic model. Furthermore, the probabilistic video representation naturally gives rise to an information-theoretic distance metric of videos. This makes the representation readily applicable to most discriminative classifiers, such as the nearest neighbor schemes and the kernel based classifiers. Experiments on two datasets, KTH and UCF sports, show that the proposed approach could deliver promising results.