Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

  • Authors:
  • Juan Carlos Niebles;Hongcheng Wang;Li Fei-Fei

  • Affiliations:
  • Department of Electrical Engineering, Princeton University, Engineering Quadrangle, Princeton, USA 08544 and Robotics and Intelligent Systems Group, Universidad del Norte, Barranquilla, Colombia;United Technologies Research Center (UTRC), East Hartford, USA 06108;Department of Computer Science, Princeton University, Princeton, USA 08540

  • Venue:
  • International Journal of Computer Vision
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

We present a novel unsupervised learning method for human action categories. A video sequence is represented as a collection of spatial-temporal words by extracting space-time interest points. The algorithm automatically learns the probability distributions of the spatial-temporal words and the intermediate topics corresponding to human action categories. This is achieved by using latent topic models such as the probabilistic Latent Semantic Analysis (pLSA) model and Latent Dirichlet Allocation (LDA). Our approach can handle noisy feature points arisen from dynamic background and moving cameras due to the application of the probabilistic models. Given a novel video sequence, the algorithm can categorize and localize the human action(s) contained in the video. We test our algorithm on three challenging datasets: the KTH human motion dataset, the Weizmann human action dataset, and a recent dataset of figure skating actions. Our results reflect the promise of such a simple approach. In addition, our algorithm can recognize and localize multiple actions in long and complex video sequences containing multiple motions.