Middle-Level representation for human activities recognition: the role of spatio-temporal relationships

  • Authors:
  • Fei Yuan;Véronique Prinet;Junsong Yuan

  • Affiliations:
  • LIAMA & NLPR, CASIA, Chinese Academy of Sciences, Beijing, China;LIAMA & NLPR, CASIA, Chinese Academy of Sciences, Beijing, China;School of EEE, Nanyang Technological University, Singapore

  • Venue:
  • ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We tackle the challenging problem of human activity recognition in realistic video sequences. Unlike local features-based methods or global template-based methods, we propose to represent a video sequence by a set of middle-level parts. A part, or component, has consistent spatial structure and consistent motion. We first segment the visual motion patterns and generate a set of middle-level components by clustering keypoints-based trajectories extracted from the video. To further exploit the interdependencies of the moving parts, we then define spatio-temporal relationships between pairwise components. The resulting descriptive middle-level components and pairwise-components thereby catch the essential motion characteristics of human activities. They also give a very compact representation of the video. We apply our framework on popular and challenging video datasets: Weizmann dataset and UT-Interaction dataset. We demonstrate experimentally that our middle-level representation combined with a χ2-SVM classifier equals to or outperforms the state-of-the-art results on these dataset.