The co-attention model for tiny activity analysis

  • Authors:
  • Longfei Zhang;Ziyu Guan;Alexander Hauptmann

  • Affiliations:
  • School of Software, Beijing Institute of Technology, Beijing, China and School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA;Computer Science Department, University of California Santa Barbara, Santa Barbara, CA, USA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

  • Venue:
  • Neurocomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

Automatic understanding of human activities is a huge challenge in multimedia analysis field. This challenge is especially critical in small-scale activities, such as finger motions, and activities in complex scenes. For typical camera views, both global feature and local feature analysis methods are unsuitable. To solve this problem, many studies focus on using spatio-temporal features and feature selection methods to get video representation. However, these spatio-temporal features are problematic for two reasons. First, we are not sure whether these features are meaningful foreground or noise. Second, we are unable to foresee where an activity will occur based on these features. Therefore, a biological feature selection method is needed to reorganize these spatio-temporal features and represent the video in a feature space. In this paper, we propose a graph based Co-Attention model to select more efficient features for activity analysis. Without reducing the dimensionality, our Co-Attention model considers the number of interest points. Our model is derived from correlations among individual tiny activities, whose salient regions are identified by combining an integrated top-down and bottom-up visual attention model, and a motion attention model built by spatio-temporal features instead of optical flow directly. Different from typical attention models, the Co-Attention model allows multiple regions of interest in video co-existing for further analysis. Experimental results on the KTH dataset, YouTube dataset and a new tiny activity dataset, Pump dataset which consist of visual observation data from patients operating an infusion pump, validate our activity analysis approach is more effective than state-of-the-art methods.