Exploiting language models to recognize unseen actions

  • Authors:
  • Dieu Thu Le;Raffaella Bernardi;Jasper Uijlings

  • Affiliations:
  • University of Trento, Trento, Italy;University of Trento, Trento, Italy;University of Trento, Trento, Italy

  • Venue:
  • Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the problem of human action recognition. Typically, visual action recognition systems need visual training examples for all actions that one wants to recognize. However, the total number of possible actions is staggering as not only are there many types of actions but also many possible objects for each action type. Normally, visual training examples are needed for all actions of this combinatorial explosion of possibilities. To address this problem, this paper is a first attempt to propose a general framework for unseen action recognition in still images by exploiting both visual and language models. Based on objects recognized in images by means of visual features, the system suggests the most plausible actions exploiting off-the-shelf language models. All components in the framework are trained on universal datasets, hence the system is general, flexible, and able to recognize actions for which no visual training example has been provided. This paper shows that our model yields good performance on unseen action recognition. It even outperforms a state-of-the-art Bag-of-Words model in a realistic scenario where few visual training examples are available.