Human action recognition via multi-view learning

  • Authors:
  • Tianzhu Zhang;Si Liu;Changsheng Xu;Hanqing Lu

  • Affiliations:
  • Institute of Automation, Beijing, P.R. China and China-Singapore Institute of Digital Media, Singapore, Singapore;Institute of Automation, Beijing, P. R. China and China-Singapore Institute of Digital Media, Singapore, Singapore;Institute of Automation, Beijing, P. R. China and China-Singapore Institute of Digital Media, Singapore, Singapore;Institute of Automation, Beijing, P. R. China and China-Singapore Institute of Digital Media, Singapore, Singapore

  • Venue:
  • ICIMCS '10 Proceedings of the Second International Conference on Internet Multimedia Computing and Service
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a novel approach to automatically learn a compact and yet discriminative representation for humane action recognition. Considering the static visual information and motion information, each frame is represented in two feature subsets (views) and Gaussian Mixture Model (GMM) is adopted to model the distributions of those features. In order to complement the strengths of the different features (views), a Co-EM based multiview learning framework is introduced to estimate the parameters of GMM instead of conventional single view based EM. Then Gaussian components are considered as video words to describe videos with different time resolutions. Compared with the traditional method to recognize action, there are several advantages with the proposed method using Co-EM strategy. First, complex actions are efficiently modeled by GMM, and the number of its component is automatically determined with the Minimum Description Length (MDL). Second, because the imperfectness of single view can be compensated by the other view in the Co-EM, the resulting bag of video words are superior to that formed by any single view. To the best of our knowledge, we are the first to try the Co-EM based multi-view learning method for action recognition and obtain significantly better results. We extensively verify our proposed approach on two publicly available challenging datasets: the KTH dataset and Weizmann dataset. The experimental results show the validity of our proposed method.