Exploring STIP-based models for recognizing human interactions in TV videos

  • Authors:
  • Manuel J. Marín-Jiménez;Enrique Yeguas;Nicolás Pérez De La Blanca

  • Affiliations:
  • Department of Computer Science and Numerical Analysis, University of Córdoba, 14071 Córdoba, Spain;Department of Computer Science and Numerical Analysis, University of Córdoba, 14071 Córdoba, Spain;Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2013

Quantified Score

Hi-index 0.10

Visualization

Abstract

Human motion recognition - action (HAR) or interaction (HIR) - in real video data is identified as a very challenging task. In the last few years models of increasing complexity have been proposed in order to improve the performance in the task. However, it still remains unclear whether it is the features or the models what deserves the increase in complexity. In this paper an evaluation of such problem is carried out in the HIR task. For that purpose, we compare the results obtained in our experiments - by using STIP-based features and BOW models as basis and combined with a standard classifier - with some of the more effective and recent approaches that use alternative representation models. We perform a comprehensive experimental study on two state-of-the-art databases in HIR: TV Human interactions and UT-interactions. We compare the results of our experiments with recent results published on these datasets. In addition, we run cross-data experiments on Hollywood-2 dataset in order to study the capability of generalization of the trained models through different datasets. The most relevant result is that the model combining STIP+BOW is competitive in the HIR task in comparison with the most complex ones. It is also shown that the vocabulary learning subtask can be improved by using compression algorithms on large enough initial set of features. In contrast to other categorization tasks the context does not help, the results show that dense sampling of STIP is the best choice, but only when it is used inside the region of interest of the interaction.