Recognizing Interactive Group Activities Using Temporal Interaction Matrices and Their Riemannian Statistics

  • Authors:
  • Ruonan Li;Rama Chellappa;Shaohua Kevin Zhou

  • Affiliations:
  • Harvard School of Engineering and Applied Sciences, Cambridge, USA 02138;Center for Automation Research, UMIACS, and the Department of Electrical and Computer Engineering, University of Maryland, College Park, USA 20742;Corporate Research & Technology, Siemens Corporation, Princeton, USA 08540

  • Venue:
  • International Journal of Computer Vision
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

While video-based activity analysis and recognition has received much attention, a large body of existing work deals with activities of a single subject. Modeling and recognition of coordinated multi-subject activities, or group activities, present in a variety of applications such as surveillance, sports, and biological monitoring records, etc., is the main objective of this paper. Unlike earlier attempts which model the complex spatial temporal constraints among multiple subjects with a parametric Bayesian network, we propose a compact and discriminative descriptor referred to as the Temporal Interaction Matrix for representing a coordinated group motion pattern. Moreover, we characterize the space of the Temporal Interaction Matrices using the Discriminative Temporal Interaction Manifold (DTIM), and use it as a framework within which we develop a data-driven strategy to characterize the group motion pattern without employing specific domain knowledge. In particular, we establish probability densities on the DTIM for compactly describing the statistical properties of the coordinations and interactions among multiple subjects in a group activity. For each class of group activity, we learn a multi-modal density function on the DTIM. A Maximum a Posteriori (MAP) classifier on the manifold is then designed for recognizing new activities. In addition, we have extended this model to one with which we can explicitly distinguish the participants from non-participants. We demonstrate how the framework can be applied to motions represented by point trajectories as well as articulated human actions represented by images. Experiments on both cases show the effectiveness of the proposed approach.